video-understand
video-understand
Understand video content locally using ffmpeg for frame extraction and Whisper for transcription. Fully offline, no API keys required.
Prerequisites
ffmpeg+ffprobe(required):brew install ffmpegopenai-whisper(optional, for transcription):pip install openai-whisper
Commands
# Scene detection + transcribe (default)
python3 skills/video-understand/scripts/understand_video.py video.mp4
# Keyframe extraction
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe
# Regular interval extraction
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval
# Limit frames extracted
python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10
# Use a larger Whisper model
python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small
# Frames only, skip transcription
python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe
# Quiet mode (JSON only, no progress)
python3 skills/video-understand/scripts/understand_video.py video.mp4 -q
# Output to file
python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json
CLI Options
| Flag | Description |
|---|---|
video |
Input video file (positional, required) |
-m, --mode |
Extraction mode: scene (default), keyframe, interval |
--max-frames |
Maximum frames to keep (default: 20) |
--whisper-model |
Whisper model size: tiny, base, small, medium, large (default: base) |
--no-transcribe |
Skip audio transcription, extract frames only |
-o, --output |
Write result JSON to file instead of stdout |
-q, --quiet |
Suppress progress messages, output only JSON |
Extraction Modes
| Mode | How it works | Best for |
|---|---|---|
scene |
Detects scene changes via ffmpeg select='gt(scene,0.3)' |
Most videos, varied content |
keyframe |
Extracts I-frames (codec keyframes) | Encoded video with natural keyframe placement |
interval |
Evenly spaced frames based on duration and max-frames | Fixed sampling, predictable output |
If scene mode detects no scene changes, it automatically falls back to interval mode.
Output
The script outputs JSON to stdout (or file with -o). See references/output-format.md for the full schema.
{
"video": "video.mp4",
"duration": 18.076,
"resolution": {"width": 1224, "height": 1080},
"mode": "scene",
"frames": [
{"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"}
],
"frame_count": 12,
"transcript": [
{"start": 0.0, "end": 2.5, "text": "Hello and welcome..."}
],
"text": "Full transcript...",
"note": "Use the Read tool to view frame images for visual understanding."
}
Use the Read tool on frame image paths to visually inspect extracted frames.
References
references/output-format.md-- Full JSON output schema documentation
More from calesthio/openmontage
video-edit
|
28video-download
|
26ffmpeg
Video and audio processing with FFmpeg. Use for format conversion, resizing, compression, audio extraction, and preparing assets for Remotion. Triggers include converting GIF to MP4, resizing video, extracting audio, compressing files, or any media transformation task.
24threejs-materials
Three.js materials - PBR, basic, phong, shader materials, material properties. Use when styling meshes, working with textures, creating custom shaders, or optimizing material performance.
23threejs-postprocessing
Three.js post-processing - EffectComposer, bloom, DOF, screen effects. Use when adding visual effects, color grading, blur, glow, or creating custom screen-space shaders.
23remotion-best-practices
Best practices for Remotion - Video creation in React
23