video-edit
Installation
SKILL.md
Video Edit Skill
Edit long videos into concise, story-driven clips by analyzing the transcript, selecting key moments, and assembling them with ffmpeg.
Workflow
Phase 1: Discover & Transcribe
- Find the video — locate file, get duration/format with
ffprobe - Extract audio — convert to 16kHz mono WAV for transcription:
ffmpeg -y -i INPUT -vn -acodec pcm_s16le -ar 16000 -ac 1 /tmp/audio.wav - Transcribe with Whisper — use
whisperPython library (install viapip3 install openai-whisperif needed):import whisper, json model = whisper.load_model("base") result = model.transcribe("/tmp/audio.wav", language="en", word_timestamps=True) # Save full result for reference with open("/tmp/transcript.json", "w") as f: json.dump(result, f) # Print timestamped segments for seg in result["segments"]: print(f'[{seg["start"]:.1f}-{seg["end"]:.1f}] {seg["text"].strip()}')
Phase 2: Analyze & Select Moments
Read the full transcript and identify the story arc. Look for:
- Setup: Where the task/prompt is clearly stated
- Progress beats: AI working, intermediate results appearing
- Positive reactions: "wow", "that's great", "it worked", "cool", "upgraded", excitement
- Key insights: Moments where approach changes or something clever happens
- Resolution: Final confirmation that it worked, "thank you", wrap-up
Selection rules:
- Skip duplicate/repeated prompts (e.g., if there was a reset and the same prompt was given twice, use only the second clean attempt)
- Skip troubleshooting tangents, debugging, off-topic conversations, bathroom breaks, silence
- Skip filler ("uh", "hmm", long pauses) — but these are fine within a selected segment
- Keep segments long enough for context (15-75s each) — don't micro-cut
- Aim for 10-20% of original duration as a starting point
- Preserve the chronological order — never rearrange
Present the plan to the user before cutting — list each segment with timestamp range and what it captures.
Phase 3: Extract & Assemble
-
Extract each segment as a separate file with re-encoding for clean cuts:
ffmpeg -y -ss START -t DURATION -i INPUT \ -c:v libx264 -preset fast -crf 23 \ -c:a aac -b:a 128k \ /tmp/segments/segNN.mp4 -
Create concat file and merge:
for f in /tmp/segments/seg*.mp4; do echo "file '$f'" >> /tmp/segments/filelist.txt done ffmpeg -y -f concat -safe 0 -i /tmp/segments/filelist.txt \ -c:v libx264 -preset fast -crf 23 \ -c:a aac -b:a 128k \ OUTPUT.mp4 -
Verify — check duration, file size, playback.
Phase 4: Save
Save to the same directory as the source with _edited suffix, or as specified by user.
Common Mistakes & Fixes
| Mistake | Fix |
|---|---|
| Including the first attempt when there was a reset/retry | Only include the clean second attempt — watch for repeated prompts |
| Cutting segments too short (<10s) | Keep 15-75s per segment for natural flow |
| Including long silences or "uh/hmm" segments | Skip segments that are mostly filler, but filler within a good segment is fine |
Using -c copy for segment extraction |
Always re-encode (-c:v libx264) — copy mode causes keyframe alignment issues and glitchy cuts |
Forgetting -safe 0 in concat |
Required when using absolute paths in the file list |
| Rearranging chronological order | Never do this — the story must flow naturally in time |
| Over-compressing (too few segments) | The edit should still tell the complete story — don't skip important transitions |
| Including debugging/troubleshooting tangents | Skip unless the debugging itself is the story |
What Makes a Good Edit
- Tells a story: setup → progress → "wow it worked" → done
- Highlights positive outcomes: reactions of surprise, satisfaction, confirmation
- Shows the AI being smart: clever approaches, autonomous decisions, learning
- No dead time: every second earns its place
- Natural transitions: segments should feel like they flow, even with jumps
- User can follow along: enough context in each segment to understand what's happening
Dependencies
ffmpeg/ffprobe(install viabrew install ffmpeg)whisperPython library (install viapip3 install openai-whisper)- Sufficient disk space for extracted audio + segments (rough guide: 2x source file size)