iopho-analyzing-videos
iopho-analyzing-videos — Video → .storyboard.md Reverse Engineering
Analyze a video file and generate a detailed .storyboard.md file — a human+AI readable "source code" representation of the video, including scene-by-scene breakdowns, visual descriptions, audio transcripts, and extracted keyframes.
Prerequisites
!python3 -c "import google.generativeai; print('google-generativeai: ✓')" 2>/dev/null || echo "google-generativeai: ✗ — install: pip install google-generativeai"
!which ffmpeg 2>/dev/null && echo "ffmpeg: ✓ (needed for frame extraction)" || echo "ffmpeg: ✗ — install: brew install ffmpeg"
!echo "GEMINI_API_KEY: $([ -n \"$GEMINI_API_KEY\" ] && echo '✓ set' || echo '✗ — set via: export GEMINI_API_KEY=your-key')"
Required: google-generativeai + ffmpeg + GEMINI_API_KEY environment variable.
Get a Gemini API key at https://aistudio.google.com/apikey (free tier available).
What It Produces
Given a video file, this skill generates:
output.storyboard.md ← YAML frontmatter + scene-by-scene markdown
frames/
scene-001.jpg ← Keyframe at midpoint of scene 1
scene-002.jpg ← Keyframe at midpoint of scene 2
...
.storyboard.md Format
---
title: "Video Title"
duration_seconds: 19
resolution: "1920x1080"
style:
visual_style: "clean, modern, minimalist"
color_palette: ["#E6E6FA", "#9400D3", "#00BFFF"]
audio:
has_voiceover: true
voiceover_language: "en"
content:
type: "product_demo"
framework: "PAS"
key_message: "..."
---
### Scene 1: Hook (0:00 – 0:02, 2s)
**Thumbnail**: 
#### Visual
- **Shot Type**: wide
- **Camera Movement**: static
- **Subject**: ...
- **Text On Screen**: "..."
#### Audio
- **Voiceover**: "exact transcript"
- **Music**: upbeat, electronic
#### Analysis
- **Narrative Purpose**: ...
- **Sales Element**: problem
- **Transition Out**: cut to →
Each scene includes: Visual (shot type, camera, subject, text, graphics), Audio (voiceover, music, SFX), and Analysis (narrative purpose, viewer psychology, sales element).
Quick Reference
Analyze a local video:
python3 scripts/video_to_storyboard.py "$VIDEO_PATH" "$OUTPUT_PATH"
Analyze without frame extraction:
python3 scripts/video_to_storyboard.py "$VIDEO_PATH" "$OUTPUT_PATH" --no-frames
Use a different Gemini model:
GEMINI_MODEL=gemini-2.5-flash python3 scripts/video_to_storyboard.py "$VIDEO_PATH"
Execute
Run the analysis pipeline:
# From $ARGUMENTS: <video_path> [output_path] [--no-frames] [--model MODEL]
#
# Default output: same directory as video, with .storyboard.md extension
# Default model: gemini-2.0-flash
#
# The agent should:
# 1. Verify video file exists
# 2. Check GEMINI_API_KEY is set
# 3. Run the analysis script
# 4. Report: output path, scene count, frame count, token usage
python3 scripts/video_to_storyboard.py "$VIDEO_PATH" "$OUTPUT_PATH"
Pipeline Details
The analysis pipeline works in 4 stages:
Stage 1: Upload
Video file → Gemini File API (supports up to ~1GB)
Wait for server-side processing
Stage 2: Analyze
Gemini Flash processes video natively (no frame extraction needed)
Generates complete .storyboard.md with scene timestamps
~$0.001 per short video (< 30s)
Stage 3: Extract Frames
Parse scene timestamps from generated markdown
ffmpeg screenshots at each scene midpoint
Insert thumbnail references into markdown
Stage 4: Output
Write .storyboard.md file
Write frames/ directory with keyframe JPGs
Clean up uploaded file from Gemini
Script Setup
If the analysis script doesn't exist yet, create it:
# Check if script exists
ls scripts/video_to_storyboard.py 2>/dev/null || echo "Script not found — see references/video_to_storyboard.py for the full implementation"
The script requires these Python packages:
pip install google-generativeai
Cost Estimates
| Video Length | Gemini Flash Cost | Notes |
|---|---|---|
| < 30s | ~$0.001 | Short ads, demos |
| 1-5 min | ~$0.01 | Explainers, tutorials |
| 5-15 min | ~$0.03 | Long-form content |
| 15-60 min | ~$0.06-0.15 | Full presentations |
Costs based on Gemini 2.0 Flash pricing. Gemini 2.5 Flash may differ.
Combine with Other iopho Skills
Full pipeline — search, download, analyze:
/iopho-searching-videos SaaS explainer under 30s --limit 5
/iopho-getting-videos https://youtube.com/watch?v=BEST_MATCH --mode all --output temp/
/iopho-analyzing-videos temp/video.mp4 temp/output.storyboard.md
Analyze + regenerate with Remotion:
After generating .storyboard.md, use it as a blueprint to recreate the video with Remotion or other video generation tools.
Tips
- Gemini Flash processes video natively — no need to pre-extract frames
- The
.info.jsonfrom iopho-getting-videos is auto-detected for source URL and title - Short videos (< 30s) produce the most accurate scene detection
- For longer videos, consider splitting first with
ffmpeg -ss START -to END - The generated
.storyboard.mdis both human-readable and machine-parseable - Use
--no-framesfor faster processing when you don't need keyframe images