skills/skills.volces.com/video-understanding

video-understanding

SKILL.md

Video Understanding (Gemini)

Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.

Requirements

  • yt-dlpbrew install yt-dlp / pip install yt-dlp
  • ffmpegbrew install ffmpeg (for merging video+audio streams)
  • GEMINI_API_KEY environment variable

Default Output

Returns structured JSON:

  • transcript — Verbatim transcript with [MM:SS] timestamps
  • description — Visual description (people, setting, UI, text on screen, flow)
  • summary — 2-3 sentence summary
  • duration_seconds — Estimated duration
  • speakers — Identified speakers
Installs
17
First Seen
Mar 12, 2026