video-edit
Video Editing Toolkit
Goal
Complete video production pipeline: silence removal, auto-captions, vertical cropping, YouTube clipping, compression, and 3D transitions.
Scripts (7 total)
| Script | Purpose |
|---|---|
jump_cut_vad_singlepass.py |
Remove silences with neural VAD (Silero) |
auto_captions.py |
Generate + burn styled subtitles (Whisper + FFmpeg) |
vertical_crop.py |
Auto-crop 16:9 → 9:16 with face tracking |
youtube_clip.py |
Download YouTube + AI chapter clipping |
compress_video.py |
Social media compression with platform presets |
simple_video_edit.py |
Full pipeline: silence removal + transcription + metadata + upload |
insert_3d_transition.py |
3D swivel teaser insertion |
Quick Start Recipes
Recipe 1: Full Social Media Pipeline
# 1. Remove silences
python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 .tmp/edited.mp4
# 2. Add captions
python3 ./scripts/auto_captions.py .tmp/edited.mp4 .tmp/captioned.mp4 --word-level --max-words 2
# 3. Crop to vertical for Shorts/Reels
python3 ./scripts/vertical_crop.py .tmp/captioned.mp4 .tmp/vertical.mp4
# 4. Compress for platform
python3 ./scripts/compress_video.py .tmp/vertical.mp4 output.mp4 --preset instagram-reel
Recipe 2: YouTube → Shorts Pipeline
# 1. Download + auto-clip YouTube video
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip --output-dir clips/
# 2. Add captions to best clip
python3 ./scripts/auto_captions.py clips/chapters/01_intro.mp4 .tmp/captioned.mp4 --word-level
# 3. Crop to vertical
python3 ./scripts/vertical_crop.py .tmp/captioned.mp4 .tmp/vertical.mp4
# 4. Compress for YouTube Shorts
python3 ./scripts/compress_video.py .tmp/vertical.mp4 short.mp4 --preset youtube-shorts
Recipe 3: Quick Edit + Upload
# All-in-one: silence removal + transcription + metadata + Auphonic upload
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "My Video"
Script 1: VAD Silence Removal
File: ./scripts/jump_cut_vad_singlepass.py
How It Works
- Extracts audio as WAV (16kHz mono)
- Runs Silero VAD to detect speech segments
- Merges close segments, adds padding
- Uses FFmpeg trim+concat to join segments in single pass
- Hardware encodes with hevc_videotoolbox (H.265, 17Mbps, 30fps)
CLI Arguments
| Argument | Default | Description |
|---|---|---|
--min-silence |
0.5 | Min silence duration to cut (seconds) |
--min-speech |
0.25 | Min speech duration to keep (seconds) |
--padding |
100 | Padding around speech (ms) |
--merge-gap |
0.3 | Merge segments closer than this (seconds) |
--keep-start |
true | Always start from 0:00 |
Usage
python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 output.mp4
python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 output.mp4 --min-silence 1.0 --padding 200
Processing Time
~8 minutes for a 49-min 4K video
Script 2: Auto-Captions
File: ./scripts/auto_captions.py
How It Works
- Transcribes video using faster-whisper (word-level timestamps)
- Generates SRT subtitles (segment-level or word-level)
- Burns styled captions into video with FFmpeg
CLI Arguments
| Argument | Default | Description |
|---|---|---|
--model |
base | Whisper model: tiny, base, small, medium, large-v3 |
--language |
auto | Force language (en, es, hi, etc.) |
--word-level |
false | Short punchy captions (1-3 words at a time) |
--max-words |
3 | Max words per caption in word-level mode |
--font-size |
22 | Caption font size (auto-scales for vertical) |
--font-color |
white | Text color: white, yellow, cyan, green, red, orange, pink |
--outline-color |
black | Outline color |
--position |
bottom | Caption position: bottom, top, middle |
--srt-only |
false | Only generate SRT, don't burn |
--srt-path |
auto | Custom SRT output path |
--bold |
true | Bold text |
Usage
# Basic captions
python3 ./scripts/auto_captions.py input.mp4 output.mp4
# Word-level (CapCut/Submagic style)
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --word-level --max-words 2
# Yellow captions at top
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --font-color yellow --position top
# SRT only (no burn)
python3 ./scripts/auto_captions.py input.mp4 --srt-only
# High accuracy
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --model large-v3
Dependencies
pip install faster-whisper
Script 3: Vertical Crop
File: ./scripts/vertical_crop.py
How It Works
- Samples frames throughout the video
- Detects faces using OpenCV Haar cascade
- Smooths face positions to avoid jitter
- Crops 16:9 → 9:16 centered on the speaker
- For moving subjects: segments video into 2s chunks with per-segment tracking
CLI Arguments
| Argument | Default | Description |
|---|---|---|
--ratio |
9:16 | Target aspect ratio (e.g., 9:16, 4:5, 1:1) |
--position |
auto | Crop position: auto (face tracking), left, center, right |
--smoothing |
30 | Smoothing window in frames |
--sample-interval |
5 | Sample every N frames for detection |
Usage
# Auto face-tracking crop
python3 ./scripts/vertical_crop.py input.mp4 output.mp4
# Square crop (Instagram post)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --ratio 1:1
# 4:5 crop (Instagram feed)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --ratio 4:5
# Center crop (no tracking)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --position center
Dependencies
pip install opencv-python numpy
Script 4: YouTube Clip
File: ./scripts/youtube_clip.py
How It Works
- Downloads video via yt-dlp (up to 4K)
- Extracts YouTube chapters if available
- Falls back to AI chapter generation (Whisper + Claude)
- Clips video into individual chapter files
CLI Arguments
| Argument | Default | Description |
|---|---|---|
--download-only |
false | Only download, don't clip |
--max-quality |
1080 | Max quality: 480, 720, 1080, 1440, 2160 |
--audio-only |
false | Download audio only (MP3) |
--start |
none | Manual clip start (seconds) |
--end |
none | Manual clip end (seconds) |
--auto-clip |
false | AI-powered chapter detection and clipping |
--use-yt-chapters |
true | Use YouTube chapters if available |
--max-clips |
none | Limit number of clips |
--whisper-model |
base | Whisper model for transcription |
--reencode |
false | Re-encode clips (precise cuts) |
--output-dir |
.tmp/clips | Output directory |
Usage
# Download only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --download-only
# Auto-clip into chapters
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip
# Extract specific range
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --start 60 --end 180 -o clip.mp4
# Download audio only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --audio-only
# Top 3 clips only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip --max-clips 3
Dependencies
pip install yt-dlp faster-whisper anthropic
Script 5: Video Compressor
File: ./scripts/compress_video.py
Platform Presets
| Preset | Resolution | CRF | Max Duration | Max Size | Notes |
|---|---|---|---|---|---|
youtube |
1080p | 18 | none | none | High quality |
youtube-shorts |
1080x1920 | 20 | 60s | none | Vertical |
instagram-reel |
1080x1920 | 23 | 90s | none | Vertical |
instagram-post |
1080x1080 | 23 | 60s | none | Square/vertical |
tiktok |
1080x1920 | 23 | 180s | none | Vertical |
twitter |
1080p | 23 | 140s | 512MB | Auto-bitrate |
linkedin |
1080p | 23 | 600s | 200MB | Auto-bitrate |
telegram |
720p | 28 | none | 50MB | For bots |
whatsapp |
720p | 28 | 120s | 16MB | Aggressive |
small |
480p | 30 | none | none | Quick sharing |
CLI Arguments
| Argument | Default | Description |
|---|---|---|
--preset |
none | Platform preset (see table above) |
--resolution |
1080 | Max height in pixels |
--crf |
23 | Quality (0=lossless, 51=worst) |
--audio-bitrate |
128k | Audio bitrate |
--target-size |
none | Target file size in MB |
--max-duration |
none | Max duration in seconds |
--list-presets |
false | Show all presets |
Usage
# Platform preset
python3 ./scripts/compress_video.py input.mp4 output.mp4 --preset instagram-reel
python3 ./scripts/compress_video.py input.mp4 output.mp4 --preset whatsapp
# Target file size
python3 ./scripts/compress_video.py input.mp4 output.mp4 --target-size 25
# Custom
python3 ./scripts/compress_video.py input.mp4 output.mp4 --resolution 720 --crf 28
# List all presets
python3 ./scripts/compress_video.py input.mp4 output.mp4 --list-presets
Script 6: Simple Video Edit (Full Pipeline)
File: ./scripts/simple_video_edit.py
How It Works
- FFmpeg silence detection + cutting
- Audio normalization (loudnorm)
- Whisper transcription
- Claude-generated YouTube metadata (summary + chapters)
- Auphonic upload → YouTube (private draft)
CLI Arguments
| Argument | Default | Description |
|---|---|---|
--video |
required | Input video path |
--title |
required | YouTube title |
--thumbnail |
none | Thumbnail image path |
--no-upload |
false | Skip Auphonic upload |
--no-normalize |
false | Skip audio normalization |
--upload-only |
false | Skip editing, just upload |
--silence-threshold |
-35 | Silence threshold in dB |
--silence-duration |
3.0 | Min silence duration (seconds) |
Usage
# Full pipeline
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "My Video"
# Local only
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "Test" --no-upload
Dependencies
pip install anthropic faster-whisper requests python-dotenv
# Requires: ANTHROPIC_API_KEY, AUPHONIC_API_KEY in .env
Script 7: 3D Swivel Teaser
File: ./scripts/insert_3d_transition.py
How It Works
- Extracts frames from later in video (default: 60s onwards)
- Creates 3D rotating "swivel" animation via Remotion
- Splits video: intro → transition → main content
- Re-encodes and concatenates with audio preserved
CLI Arguments
| Argument | Default | Description |
|---|---|---|
--insert-at |
3 | Where to insert teaser (seconds) |
--duration |
5 | Teaser duration (seconds) |
--teaser-start |
60 | Where to sample content from (seconds) |
--bg-color |
#2d3436 | Background color (hex) |
--bg-image |
none | Background image path |
Final Timeline
[0-3s intro] [3-8s swivel teaser @ 100x] [8s onwards: edited content]
Audio: Original audio plays continuously
Dependencies
pip install torch # For Silero VAD (jump_cut script)
brew install ffmpeg node # macOS
cd video_effects && npm install # For Remotion 3D rendering
All Dependencies (Install Once)
# Core (required for all scripts)
brew install ffmpeg
# Silence removal
pip install torch
# Auto-captions + YouTube clip
pip install faster-whisper
# Vertical crop
pip install opencv-python numpy
# YouTube download
pip install yt-dlp
# AI features (chapters, metadata)
pip install anthropic
# Full pipeline (simple_video_edit)
pip install requests python-dotenv
# 3D transitions
brew install node
cd scripts/video_effects && npm install
Troubleshooting
| Issue | Solution |
|---|---|
| Cuts feel abrupt | --padding 200 in jump_cut |
| Too much cut | --min-silence 1.0 in jump_cut |
| Captions too small | --font-size 32 in auto_captions |
| Vertical video detected | Font auto-scales 1.3x |
| Won't play in QuickTime | Ensure hvc1 codec tag |
| Face not detected | Try --position center in vertical_crop |
| YouTube download fails | Update yt-dlp: pip install -U yt-dlp |
| File too large | Use --preset whatsapp or --target-size 25 |
| Hardware encoder fails | Auto-falls back to software (libx264) |
Technical Details
- macOS: Hardware encoding (hevc_videotoolbox / h264_videotoolbox)
- Fallback: libx264/libx265 with CRF
- Audio: AAC 128-192kbps
- Uses
hvc1codec tag for QuickTime compatibility - All scripts support
--helpfor full argument list
Schema
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
input_video |
file_path | Yes | Input video file path |
recipe |
string | No | Pipeline recipe: full-social, youtube-shorts, quick-edit |
preset |
string | No | Compression preset: youtube, instagram-reel, tiktok, whatsapp, etc. |
Outputs
| Name | Type | Description |
|---|---|---|
output_video |
file_path | Processed video file path |
Credentials
| Name | Source |
|---|---|
ANTHROPIC_API_KEY |
.env (for AI chapters) |
AUPHONIC_API_KEY |
.env (for upload) |
Composable With
Skills that chain well with this one: pan-3d-transition, recreate-thumbnails
Cost
Free locally (FFmpeg + Silero VAD)
More from aiagentwithdhruv/skills
image-to-video
Generate AI video from static images using Kling 3.0, Hailuo, Luma Ray3, Runway Gen-4.5, and 8 other tools. Covers free vs paid tools, prompt writing (motion-only), camera control, and face stability. Use when user asks to animate an image, create AI video, or convert photo to video.
91mac-control
MCP server for AI-powered macOS control — apps, display, audio, files, screenshots, clipboard
60gmaps-leads
Scrape Google Maps for B2B leads with deep website enrichment and contact extraction. Use when user asks to find local businesses, scrape Google Maps, generate contractor lists, or build local service business databases.
42excalidraw-visuals
Use when someone asks for a hand-drawn visual, PNG image, rendered diagram, visual explanation, or says "excalidraw image" or "excalidraw visual". This generates PNG images, not editable files.
34design-website
Generate a premium mockup website for a prospect using the buildinamsterdam.com template style. Use when user asks to design a website, create a mockup, or build a prospect website.
27scrape-leads
Scrape and verify business leads using Apify, classify with LLM, enrich emails, and save to Google Sheets. Use when user asks to find leads, scrape businesses, generate prospect lists, or build lead databases for any industry or location.
26