Video Processor

Instructions

This skill provides comprehensive video processing utilities including YouTube video download, audio extraction, format conversion, and audio transcription using yt-dlp, FFmpeg, and OpenAI's Whisper model.

Prerequisites

Required tools (must be installed in your environment):

yt-dlp: Video downloader for YouTube and thousands of other sites

# Install via pip
pip install -U yt-dlp

# Verify installation
yt-dlp --version

FFmpeg: Multimedia framework for video/audio processing

# macOS
brew install ffmpeg

# Ubuntu/Debian
apt-get install ffmpeg

# Verify installation
ffmpeg -version

OpenAI Whisper: Speech-to-text transcription model

# Install via pip
pip install -U openai-whisper

# Verify installation
whisper --help

Python packages (included in script via PEP 723):

click (CLI framework)
ffmpeg-python (Python wrapper for FFmpeg)
yt-dlp (video downloader)

Workflow

Use the scripts/video_processor.py script for all video processing tasks. The script provides a simple CLI with the following commands:

0. Download Video from YouTube or Other Platforms (NEW!)

Download videos from YouTube and thousands of other supported websites:

# Download video
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." output.mp4

# Download audio only (as MP3)
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." --audio-only

# Show video info without downloading
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." --info

# Download with subtitles
uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." output.mp4 --subtitle

Options:

--audio-only: Download audio only (extracts to MP3)
--subtitle: Download and embed subtitles (supports en, zh-Hans, zh-Hant)
--info: Show video information without downloading
--format: Specify video format preference (default: best quality)

1. Extract Audio from Video

Extract the audio track from a video file:

uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio input.mp4 output.wav

Options:

--format: Output audio format (default: wav). Supports: wav, mp3, aac, flac
Output is suitable for transcription or standalone audio use

2. Convert Video to MP4

Convert any video file to MP4 format:

uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 input.avi output.mp4

Options:

--codec: Video codec (default: libx264). Common options: libx264, libx265, h264
--preset: Encoding speed/quality preset (default: medium). Options: ultrafast, fast, medium, slow, veryslow

3. Convert Video to WebM

Convert any video file to WebM format (web-optimized):

uv run .claude/skills/video-processor/scripts/video_processor.py to-webm input.mp4 output.webm

Options:

--codec: Video codec (default: libvpx-vp9). Options: libvpx, libvpx-vp9
WebM is optimized for web playback and streaming

4. Transcribe Audio with Whisper

Transcribe audio or video files to text using OpenAI's Whisper model:

# Transcribe video file (audio will be extracted automatically)
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe input.mp4 transcript.txt

# Transcribe audio file directly
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe audio.wav transcript.txt

Options:

--model: Whisper model size (default: base). Options:
- tiny: Fastest, lowest accuracy (~1GB RAM)
- base: Fast, good accuracy (~1GB RAM) [DEFAULT]
- small: Balanced (~2GB RAM)
- medium: High accuracy (~5GB RAM)
- large: Best accuracy, slowest (~10GB RAM)
--language: Language code (default: auto-detect). Examples: en, es, fr, de, zh
--format: Output format (default: txt). Options: txt, srt, vtt, json

Transcription workflow:

If input is video, FFmpeg extracts audio to temporary WAV file
Whisper processes the audio file
Transcription is saved in requested format
Temporary files are cleaned up automatically

5. Combined Workflow Example

Process a video end-to-end:

# 1. Extract audio for analysis
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav

# 2. Transcribe to SRT subtitles
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 lecture.srt --format srt --model small

# 3. Convert to web format
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm lecture.mp4 lecture.webm

Key Technical Details

FFmpeg and Whisper Integration:

FFmpeg doesn't transcribe audio itself - it prepares audio for external transcription
The workflow is: Extract audio (FFmpeg) → Transcribe (Whisper) → Optional: Re-integrate with video
FFmpeg can pipe audio directly to Whisper for real-time processing (advanced use case)

Audio Format for Transcription:

Whisper works best with WAV or MP3 formats
Sample rate: 16kHz is optimal (script handles conversion automatically)
The script extracts audio with optimal settings for Whisper

Output Formats:

txt: Plain text transcript
srt: SubRip subtitle format (includes timestamps)
vtt: WebVTT subtitle format (web standard)
json: Detailed JSON with word-level timestamps

Error Handling

The script includes comprehensive error handling:

Validates input files exist
Checks FFmpeg and Whisper are installed
Provides clear error messages for missing dependencies
Handles temporary file cleanup on errors

Performance Tips

Use tiny or base models for quick drafts
Use small or medium for production transcriptions
Use large only when maximum accuracy is required
For long videos, consider extracting audio first, then transcribe in segments
WebM conversion with VP9 takes longer but produces smaller files

Examples

Example 1: Quick Video to MP4 Conversion

User request:

I have an AVI file from my old camera. Can you convert it to MP4?

You would:

Use the to-mp4 command with default settings:

uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 old_video.avi output.mp4

Confirm the conversion completed successfully
Inform the user about the output file location

Example 2: Extract Audio and Transcribe

User request:

I recorded a lecture video and need a transcript. Can you extract the audio and transcribe it?

You would:

First extract the audio:

uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav

Then transcribe using the base model (good balance of speed/accuracy):

uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 transcript.txt --model base

Share the transcript.txt file with the user

Example 3: Create Web-Optimized Video with Subtitles

User request:

I need to put this video on my website with subtitles. Can you help?

You would:

Convert to WebM for web optimization:

uv run .claude/skills/video-processor/scripts/video_processor.py to-webm presentation.mp4 presentation.webm

Generate SRT subtitle file:

uv run .claude/skills/video-processor/scripts/video_processor.py transcribe presentation.mp4 subtitles.srt --format srt --model small

Inform user they now have:
- presentation.webm (web-optimized video)
- subtitles.srt (subtitle file for embedding)

Example 4: High-Quality Transcription with Language Specification

User request:

I have a Spanish interview video that needs an accurate transcript for publication.

You would:

Use a larger model with language specified for best accuracy:

uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.txt --model medium --language es

Optionally create SRT for review:

uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.srt --format srt --model medium --language es

Review the transcript with the user and make any necessary corrections

Example 5: Batch Processing Multiple Videos

User request:

I have a folder of training videos that all need to be converted to WebM and transcribed.

You would:

List all video files in the directory:
```
ls training_videos/*.mp4
```

For each video file, run the conversion and transcription:

# For each video: video1.mp4, video2.mp4, etc.
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm training_videos/video1.mp4 output/video1.webm
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe training_videos/video1.mp4 output/video1.txt --model base

# Repeat for each file

Confirm all conversions and transcriptions completed
Provide summary of output files

Summary

The video-processor skill provides a unified interface for common video processing tasks:

Audio extraction: Extract audio tracks in various formats
Format conversion: Convert to MP4 (universal) or WebM (web-optimized)
Transcription: Speech-to-text with multiple output formats
Flexible: CLI arguments for model selection, language, and output formats

All operations are handled through a single, well-documented script with sensible defaults and comprehensive error handling.