Video Processor
Video Processor
Instructions
This skill provides comprehensive video processing utilities including YouTube video download, audio extraction, format conversion, and audio transcription using yt-dlp, FFmpeg, and OpenAI's Whisper model.
Prerequisites
Required tools (must be installed in your environment):
-
yt-dlp: Video downloader for YouTube and thousands of other sites
# Install via pip pip install -U yt-dlp # Verify installation yt-dlp --version -
FFmpeg: Multimedia framework for video/audio processing
# macOS brew install ffmpeg # Ubuntu/Debian apt-get install ffmpeg # Verify installation ffmpeg -version -
OpenAI Whisper: Speech-to-text transcription model
# Install via pip pip install -U openai-whisper # Verify installation whisper --help
Python packages (included in script via PEP 723):
- click (CLI framework)
- ffmpeg-python (Python wrapper for FFmpeg)
- yt-dlp (video downloader)
Workflow
Use the scripts/video_processor.py script for all video processing tasks. The script provides a simple CLI with the following commands:
0. Download Video from YouTube or Other Platforms (NEW!)
Download videos from YouTube and thousands of other supported websites:
# Download video
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." output.mp4
# Download audio only (as MP3)
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." --audio-only
# Show video info without downloading
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." --info
# Download with subtitles
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." output.mp4 --subtitle
Options:
--audio-only: Download audio only (extracts to MP3)--subtitle: Download and embed subtitles (supports en, zh-Hans, zh-Hant)--info: Show video information without downloading--format: Specify video format preference (default: best quality)
1. Extract Audio from Video
Extract the audio track from a video file:
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio input.mp4 output.wav
Options:
--format: Output audio format (default: wav). Supports: wav, mp3, aac, flac- Output is suitable for transcription or standalone audio use
2. Convert Video to MP4
Convert any video file to MP4 format:
uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 input.avi output.mp4
Options:
--codec: Video codec (default: libx264). Common options: libx264, libx265, h264--preset: Encoding speed/quality preset (default: medium). Options: ultrafast, fast, medium, slow, veryslow
3. Convert Video to WebM
Convert any video file to WebM format (web-optimized):
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm input.mp4 output.webm
Options:
--codec: Video codec (default: libvpx-vp9). Options: libvpx, libvpx-vp9- WebM is optimized for web playback and streaming
4. Transcribe Audio with Whisper
Transcribe audio or video files to text using OpenAI's Whisper model:
# Transcribe video file (audio will be extracted automatically)
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe input.mp4 transcript.txt
# Transcribe audio file directly
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe audio.wav transcript.txt
Options:
--model: Whisper model size (default: base). Options:tiny: Fastest, lowest accuracy (~1GB RAM)base: Fast, good accuracy (~1GB RAM) [DEFAULT]small: Balanced (~2GB RAM)medium: High accuracy (~5GB RAM)large: Best accuracy, slowest (~10GB RAM)
--language: Language code (default: auto-detect). Examples: en, es, fr, de, zh--format: Output format (default: txt). Options: txt, srt, vtt, json
Transcription workflow:
- If input is video, FFmpeg extracts audio to temporary WAV file
- Whisper processes the audio file
- Transcription is saved in requested format
- Temporary files are cleaned up automatically
5. Combined Workflow Example
Process a video end-to-end:
# 1. Extract audio for analysis
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav
# 2. Transcribe to SRT subtitles
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 lecture.srt --format srt --model small
# 3. Convert to web format
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm lecture.mp4 lecture.webm
Key Technical Details
FFmpeg and Whisper Integration:
- FFmpeg doesn't transcribe audio itself - it prepares audio for external transcription
- The workflow is: Extract audio (FFmpeg) → Transcribe (Whisper) → Optional: Re-integrate with video
- FFmpeg can pipe audio directly to Whisper for real-time processing (advanced use case)
Audio Format for Transcription:
- Whisper works best with WAV or MP3 formats
- Sample rate: 16kHz is optimal (script handles conversion automatically)
- The script extracts audio with optimal settings for Whisper
Output Formats:
- txt: Plain text transcript
- srt: SubRip subtitle format (includes timestamps)
- vtt: WebVTT subtitle format (web standard)
- json: Detailed JSON with word-level timestamps
Error Handling
The script includes comprehensive error handling:
- Validates input files exist
- Checks FFmpeg and Whisper are installed
- Provides clear error messages for missing dependencies
- Handles temporary file cleanup on errors
Performance Tips
- Use
tinyorbasemodels for quick drafts - Use
smallormediumfor production transcriptions - Use
largeonly when maximum accuracy is required - For long videos, consider extracting audio first, then transcribe in segments
- WebM conversion with VP9 takes longer but produces smaller files
Examples
Example 1: Quick Video to MP4 Conversion
User request:
I have an AVI file from my old camera. Can you convert it to MP4?
You would:
- Use the to-mp4 command with default settings:
uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 old_video.avi output.mp4 - Confirm the conversion completed successfully
- Inform the user about the output file location
Example 2: Extract Audio and Transcribe
User request:
I recorded a lecture video and need a transcript. Can you extract the audio and transcribe it?
You would:
- First extract the audio:
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav - Then transcribe using the base model (good balance of speed/accuracy):
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 transcript.txt --model base - Share the transcript.txt file with the user
Example 3: Create Web-Optimized Video with Subtitles
User request:
I need to put this video on my website with subtitles. Can you help?
You would:
- Convert to WebM for web optimization:
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm presentation.mp4 presentation.webm - Generate SRT subtitle file:
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe presentation.mp4 subtitles.srt --format srt --model small - Inform user they now have:
- presentation.webm (web-optimized video)
- subtitles.srt (subtitle file for embedding)
Example 4: High-Quality Transcription with Language Specification
User request:
I have a Spanish interview video that needs an accurate transcript for publication.
You would:
- Use a larger model with language specified for best accuracy:
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.txt --model medium --language es - Optionally create SRT for review:
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.srt --format srt --model medium --language es - Review the transcript with the user and make any necessary corrections
Example 5: Batch Processing Multiple Videos
User request:
I have a folder of training videos that all need to be converted to WebM and transcribed.
You would:
- List all video files in the directory:
ls training_videos/*.mp4 - For each video file, run the conversion and transcription:
# For each video: video1.mp4, video2.mp4, etc. uv run .claude/skills/video-processor/scripts/video_processor.py to-webm training_videos/video1.mp4 output/video1.webm uv run .claude/skills/video-processor/scripts/video_processor.py transcribe training_videos/video1.mp4 output/video1.txt --model base # Repeat for each file - Confirm all conversions and transcriptions completed
- Provide summary of output files
Summary
The video-processor skill provides a unified interface for common video processing tasks:
- Audio extraction: Extract audio tracks in various formats
- Format conversion: Convert to MP4 (universal) or WebM (web-optimized)
- Transcription: Speech-to-text with multiple output formats
- Flexible: CLI arguments for model selection, language, and output formats
All operations are handled through a single, well-documented script with sensible defaults and comprehensive error handling.