film-breakdown
Film Breakdown
Overview
Structured breakdown analysis for films, series, anime, and videos (tech videos, vlogs, video essays, etc.).
Core capabilities:
- Genre routing: load analysis frameworks by genre
- Structured analysis: opinionated segment-by-segment output following framework skeleton
- Video processing (optional): scene detection, keyframe extraction, subtitle parsing/transcription
When to use
- User names a work and asks for breakdown/analysis
- User asks about cinematography, narrative structure, audiovisual craft
- User wants to understand why a video is effective or why a film works
- User compares techniques between two works
Three input modes
Different paths based on what the user has. No mode is "degraded" -- each is a complete analysis path.
Mode A: Conversation analysis (most common)
User has seen the work and wants to discuss/analyze it. No local files.
Input: Work title + user description/memory/questions What to do:
- Confirm genre, load frameworks
- Analyze using agent's knowledge + user-provided details, following framework structure
- Reference specific scenes, segments, timestamps (based on shared knowledge)
- Write analysis as markdown, generate HTML report with
generate_report.mjs(text-only layout when no keyframes)
This is the most natural scenario. Most cinephiles/creators discuss works from memory, not while scrubbing through files.
Mode B: Image-assisted analysis
User provides screenshots, keyframes, stills, or subtitle files.
Input: Image file paths / subtitle files + work info What to do:
- View images with Read tool (Claude supports multimodal)
- Parse subtitle files with
extract_subtitles.mjsif provided - Combine visual + text information, analyze following framework
- Write analysis as markdown (referencing user-provided images), generate HTML report with
generate_report.mjs
Mode C: Full video processing
User provides a local video file or URL. Most processing, most precise analysis.
Input: Local video file path, or URL (requires yt-dlp installed) What to do:
If input is a URL, download first:
yt-dlp -f 'bestvideo[height<=720]+bestaudio/best[height<=720]' \
--merge-output-format mp4 -o '/tmp/film-breakdown/<name>/video.%(ext)s' '<URL>'
- Cap at 720p -- keyframe analysis doesn't need high resolution; 4K only bloats keyframes and reports
- Download subtitles if available:
yt-dlp --write-sub --write-auto-sub --sub-lang zh-Hans,zh-CN,en --sub-format srt --skip-download
Then follow the video processing workflow below.
Video processing workflow (Mode C only)
Step 1: Extract structured data
The following steps can run in parallel:
# 1a. Scene detection + keyframe extraction
node scripts/extract_scenes.mjs \
--input /path/to/video.mp4 \
--output /tmp/film-breakdown/scenes/ \
--threshold 0.3
# 1b. Subtitle extraction
node scripts/extract_subtitles.mjs \
--input /path/to/video.mp4 \
--srt /path/to/subtitle.srt
# 1c. Audio transcription (only when no subtitles)
node scripts/transcribe.mjs \
--input /path/to/video.mp4 \
--output /tmp/film-breakdown/transcript.json \
--model base
Step 1.5: Compress keyframes (if needed)
If keyframes come from high-resolution sources or there are too many, compress to control report size:
# Scale keyframes to 1280px width, quality 80
for f in /tmp/film-breakdown/scenes/keyframes/*.jpg; do
ffmpeg -i "$f" -vf "scale=1280:-1" -q:v 4 -y "$f" 2>/dev/null
done
Target: single keyframe < 150KB, total report < 3MB.
Step 2: Build unified timeline
node scripts/build_timeline.mjs \
--scenes /tmp/film-breakdown/scenes/scenes.json \
--subtitles /tmp/film-breakdown/scenes/subtitles.json \
--output /tmp/film-breakdown/timeline.json
Timeline schema: see references/schema.md.
Step 3: View keyframes + analyze with framework
View extracted keyframes with the Read tool, combine with timeline data for segment-by-segment analysis.
Step 4: Generate static report
Write analysis as a markdown file (using standard  syntax to reference keyframes), then generate self-contained HTML report:
node scripts/generate_report.mjs \
--analysis /tmp/film-breakdown/analysis.md \
--keyframes /tmp/film-breakdown/scenes/keyframes \
--output /tmp/film-breakdown/report.html
Report features:
- All keyframes embedded as base64, single file readable offline
- Auto-detects content language: CJK content gets serif CJK typography; Latin content gets Georgia serif
- Print-ready (
@media printoptimized)
Genre routing
- User declares genre, or agent determines from content
- Always load
references/frameworks/_base.md - Load matching genre framework(s) (can combine multiple)
Narrative (film/series/anime)
| Genre | Framework file |
|---|---|
| Sci-fi | sci-fi.md |
| Fantasy | fantasy.md |
| Horror/Thriller | horror-thriller.md |
| Mystery/Detective | mystery.md |
| Crime/Gangster | crime.md |
| Action/Adventure | action-adventure.md |
| War | war.md |
| Romance | romance.md |
| Comedy | comedy.md |
| Art house/Auteur | art-house.md |
| Historical/Biopic | historical-biopic.md |
| Wuxia/Martial arts | wuxia-martial-arts.md |
| Cyberpunk | cyberpunk.md |
| Post-apocalyptic | post-apocalyptic.md |
| Psychological | psychological.md |
| Documentary | documentary.md |
| Musical | musical.md |
Video (non-fiction/non-traditional narrative)
| Genre | Framework file |
|---|---|
| Tech video | tech-video.md |
| Vlog | vlog.md |
| Video essay | video-essay.md |
| Product review | product-review.md |
| Tutorial | tutorial.md |
| News/Commentary | news-commentary.md |
Genres can be combined. e.g. "cyberpunk mystery" loads cyberpunk.md + mystery.md.
Analysis output
Analysis must have depth. A good breakdown report should make the reader feel "I watched this many times and never noticed that."
Narrative works
- Overview -- genre, duration, structure type
- Overall judgment -- macro-level assessment based on framework, lead with the conclusion
- Segment analysis -- by scene/segment/act, each containing:
- Timecode or position marker (precise to seconds with files, narrative position in conversation mode)
- Keyframe reference (if available)
- Analysis across framework dimensions for that segment
- Specialized analysis -- beyond segment analysis, must cover
_base.mdcore dimensions:- Cinematography (shot scale choices, camera movement, composition patterns)
- Editing (rhythm, transitions, time manipulation)
- Sound design (score, ambient sound, use of silence)
- Visual design (color system, lighting, production design)
- Not every dimension gets equal weight -- expand most on whichever dimension is the work's strongest craft
- Core techniques -- techniques worth learning, creator's signature methods
Video works
On top of narrative analysis, additionally output:
- Why it works (propagation mechanism analysis)
- Reusable structural patterns
- Video works also need visual/editing/sound dimension coverage -- excellent tech videos and vlogs are no less crafted than narrative works
Report depth
Users can specify analysis depth and length. If unspecified, use these defaults:
| Depth | Use case | Word count | Keyframe density |
|---|---|---|---|
| Quick | Quick overview of core techniques | 1000-2000 words | 3-5 frames |
| Standard (default) | Full breakdown covering major dimensions | 4000-6000 words | 1 per 2-3 min |
| Deep | Scene-by-scene deep analysis for craft study | 8000-12000 words | 1 per minute or denser |
Scale by content length:
- Under 5 minutes: above word counts x 0.5
- 5-30 minutes: above word counts x 1
- Feature film: above word counts x 1.5
Use Quick when user says "quick look", "brief analysis"; use Deep when user says "deep dive", "scene by scene"; use Standard when unspecified.
Dependencies
Mode A (conversation) has no external dependencies. Mode B/C requires:
ffmpeg-- video processing, scene detection, keyframe extractionwhisper-- audio transcription (optional, only when no subtitles; supports whisper.cpp or openai-whisper)
Scripts check for dependencies on startup and suggest installation rather than crashing. Mode C falls back to Mode B (subtitles only) or Mode A (conversation) if ffmpeg is unavailable.
Conventions
- Analysis must have opinions, not just list observations without conclusions
- Narrative works: focus on "why is it good" (craft); video works: focus on "why is it effective" (propagation mechanism)
- No academic jargon pileup -- use clear language to explain the causal relationship between technique and effect
- Reference specific timecodes and frames, don't generalize
- When combining genres, prioritize analyzing unique effects produced by genre intersection