Video Clip Extractor Skill

Run the video orchestrator to process videos and extract engaging highlights.

When Triggered

Get the source — if the user didn't provide a video URL or file path, ask for it.
Clarify intent (optional) — if the user wants clips focused on a specific topic, capture it for --user-intent. If unclear, ask: "Any specific topic or moments to focus on? (e.g. 'funny moments', 'key arguments')"
Check environment — does video_orchestrator.py exist in the current directory? If yes, run directly. Otherwise use the global install at ~/.local/share/openclip.
Verify prerequisites — check ffmpeg is installed and at least one API key is set. Warn if missing before running.
Run the command and stream output to user.
Report results — after completion, list the generated clips with timestamps and titles.

Setup (first use only)

Before running, determine the execution context:

Inside openclip repo — if video_orchestrator.py exists in the current directory, skip setup and run directly.
Global install — if ~/.local/share/openclip does not exist, run these steps:

Prerequisites: git and uv must be installed.

Install uv if missing: macOS: brew install uv · Linux/Windows: pip install uv

git clone https://github.com/linzzzzzz/openclip.git ~/.local/share/openclip
cd ~/.local/share/openclip && uv sync

To update openclip later:

git -C ~/.local/share/openclip pull && cd ~/.local/share/openclip && uv sync

Execution

If inside the openclip repo (current directory contains video_orchestrator.py):

uv run python video_orchestrator.py [options] <source>

If running globally (from any other directory):

cd ~/.local/share/openclip && uv run python video_orchestrator.py -o "$OLDPWD/processed_videos" [options] <source>

$OLDPWD captures the user's original directory so clips are saved there, not inside the openclip install.

Where <source> is a video URL (Bilibili/YouTube) or local file path (MP4, WebM, AVI, MOV, MKV).

For local files with existing subtitles, place the .srt file in the same directory with the same filename (e.g. video.mp4 → video.srt).

Preflight Checklist

Inside openclip repo: run from the repo root so relative paths (e.g. references/, prompts/) resolve correctly
ffmpeg must be installed (required for all clip generation):
- macOS: brew install ffmpeg
- Ubuntu: sudo apt install ffmpeg
- Windows: download from ffmpeg.org
- If using --burn-subtitles: needs ffmpeg with libass (see README for details)
Set one API key:
- QWEN_API_KEY (default provider: qwen), or
- OPENROUTER_API_KEY (if --llm-provider openrouter), or
- GLM_API_KEY (if --llm-provider glm), or
- MINIMAX_API_KEY (if --llm-provider minimax)
If using --speaker-references: run uv sync --extra speakers and set HUGGINGFACE_TOKEN

CLI Reference

Required

Argument	Description
`source`	Video URL or local file path

Optional

Flag	Default	Description
`-o`, `--output <dir>`	`processed_videos`	Output directory
`--max-clips <n>`	`5`	Maximum number of highlight clips
`--browser <browser>`	`firefox`	Browser for cookies: `chrome`, `firefox`, `edge`, `safari`
`--title-style <style>`	`fire_flame`	Title style: `gradient_3d`, `neon_glow`, `metallic_gold`, `rainbow_3d`, `crystal_ice`, `fire_flame`, `metallic_silver`, `glowing_plasma`, `stone_carved`, `glass_transparent`
`--title-font-size <size>`	`medium`	Font size preset for artistic titles. Options: small(30px), medium(40px), large(50px), xlarge(60px)
`--cover-text-location <loc>`	`center`	Cover text position: `top`, `upper_middle`, `bottom`, `center`
`--cover-fill-color <color>`	`yellow`	Cover text fill color: `yellow`, `red`, `white`, `cyan`, `green`, `orange`, `pink`, `purple`, `gold`, `silver`
`--cover-outline-color <color>`	`black`	Cover text outline color: `yellow`, `red`, `white`, `cyan`, `green`, `orange`, `pink`, `purple`, `gold`, `silver`, `black`
`--language <lang>`	`zh`	Output language: `zh` (Chinese), `en` (English)
`--llm-provider <provider>`	`qwen`	LLM provider: `qwen`, `openrouter`, `glm`, `minimax`
`--user-intent <text>`	—	Free-text focus description (e.g. "moments about AI risks"). Steers LLM clip selection toward this topic
`--subtitle-translation <lang>`	—	Translate subtitles to this language before burning (e.g. `"Simplified Chinese"`). Requires `--burn-subtitles` and `QWEN_API_KEY`
`--speaker-references <dir>`	—	Directory of reference WAV files (one per speaker, filename = speaker name) for speaker diarization. Requires `uv sync --extra speakers` and `HUGGINGFACE_TOKEN`
`-f`, `--filename <template>`	—	yt-dlp template: `%(title)s`, `%(uploader)s`, `%(id)s`, etc.

Flags

Flag	Description
`--force-whisper`	Ignore platform subtitles, use Whisper
`--skip-download`	Use existing downloaded video
`--skip-transcript`	Skip transcript generation, use existing transcript file
`--skip-analysis`	Skip analysis, use existing analysis file for clip generation
`--use-background`	Include background info (streamer names/nicknames) in analysis prompts
`--skip-clips`	Skip clip generation
`--add-titles`	Add artistic titles to clips (disabled by default)
`--skip-cover`	Skip cover image generation
`--burn-subtitles`	Burn SRT subtitles into video. Output goes to `clips_post_processed/`. Requires ffmpeg with libass
`-v`, `--verbose`	Enable verbose logging
`--debug`	Export full prompts sent to LLM (saved to `debug_prompts/`)

Custom Filename Template (`-f`)

Uses yt-dlp template syntax. Common variables: %(title)s, %(uploader)s, %(upload_date)s, %(id)s, %(ext)s, %(duration)s.

Example: -f "%(upload_date)s_%(title)s.%(ext)s"

Environment Variables

Set the appropriate API key for the chosen --llm-provider:

QWEN_API_KEY — for --llm-provider qwen
OPENROUTER_API_KEY — for --llm-provider openrouter
GLM_API_KEY — for --llm-provider glm
MINIMAX_API_KEY — for --llm-provider minimax

Workflow

The orchestrator runs this pipeline automatically:

Download — fetch video + platform subtitles (Bilibili/YouTube) or accept local file
Split — divide videos longer than the built-in threshold into segments for parallel analysis
Transcribe — use platform subtitles or Whisper AI; --force-whisper overrides
Analyze — LLM scores transcript segments for engagement; --user-intent steers selection
Generate clips — ffmpeg cuts the video at identified timestamps
Add titles (opt-in) — render artistic text overlay using --title-style
Generate covers — create thumbnail image for each clip

Use --skip-clips, --skip-cover to skip specific steps. Use --add-titles to enable artistic titles. Use --skip-download and --skip-analysis to resume from intermediate results.

Output Example

After a successful run, report results like this:

✅ Processing complete — 5 clips generated
📁 processed_videos/video_name/clips/

  clip_01.mp4  [00:12:34 – 00:15:20]  "Title of the moment"
  clip_02.mp4  [00:28:45 – 00:31:10]  "Another highlight"
  clip_03.mp4  [00:45:00 – 00:47:30]  "Key discussion point"
  ...

Cover images: clips/*.jpg

Output Structure

processed_videos/{video_name}/
├── downloads/              # Original video, subtitles, and metadata (URL sources)
├── local_videos/           # Copied video and subtitles (local file sources)
├── splits/                 # Split parts and AI analysis results
├── clips/                  # Generated highlight clips + cover images
└── clips_post_processed/   # Post-processed clips when using --add-titles and/or --burn-subtitles

Option Selection Guide

Whisper model — Default base works for clear audio. Use small for background noise, multiple speakers, or accents. Use turbo for speed + accuracy. Use large/medium only when transcript quality is critical.

--force-whisper — Use when platform subtitles are auto-generated (often inaccurate), when "no engaging moments found" occurs (better transcripts improve analysis), or for non-native language content where platform captions are unreliable.

--use-background — Use for content featuring recurring personalities (streamers, hosts) where nicknames and community references matter. Reads from prompts/background/background.md.

Multi-part analysis — Videos that get split are analyzed per-segment, then aggregated to the top 5 engaging moments across all segments.

--user-intent — Steers LLM clip selection at both the per-segment and cross-segment aggregation stages. Useful when you want to find clips about a specific topic (e.g. "AI safety predictions", "funny moments").

--burn-subtitles — Hardcodes the SRT subtitle into the video frame. Use when you want subtitles always visible (e.g. for social media). Combine with --subtitle-translation to add a translated subtitle track below the original.

--speaker-references — Enables speaker diarization for interviews/podcasts. Provide a directory of 10–30 second clean WAV clips (one per speaker), named after the speaker (e.g. references/Host.wav).

Troubleshooting

Error	Fix
"ffmpeg not found" / clip generation fails silently	Install ffmpeg: `brew install ffmpeg` (macOS) or `sudo apt install ffmpeg` (Ubuntu)
"No API key provided"	Set `QWEN_API_KEY`, `OPENROUTER_API_KEY`, `GLM_API_KEY`, or `MINIMAX_API_KEY` env var
"Video download failed"	Check network/URL; try different `--browser`; or use local file
"Transcript generation failed"	Try `--force-whisper` or check audio quality
"No engaging moments found"	Try `--force-whisper` for better transcript accuracy
"Clip generation failed"	Ensure analysis completed; check for existing analysis file

video-clip-extractor