watch-video
Watch Video
Analyze a public video URL and produce a vivid, detailed report of what was seen on screen AND said aloud — precise enough for an engineer, PM, or designer to act on without watching it.
Workflow
Step 1: Determine video platform
| Platform | Strategy |
|---|---|
| Loom | Extract metadata + thumbnail from page; use Gemini on thumbnail + description |
| YouTube | Use yt-dlp to download audio → Whisper transcription; or Gemini with thumbnail |
| Other | Try yt-dlp first; fall back to page scrape + thumbnail |
Step 2: Extract what you can without downloading
Always try to get free data first (no disk needed):
# Get Loom metadata (title, description, duration, transcript status)
python3 scripts/loom_meta.py "https://www.loom.com/share/VIDEO_ID"
# Get YouTube metadata
yt-dlp --dump-json --no-download "YOUTUBE_URL" | python3 -c "
import sys, json
d = json.load(sys.stdin)
print('Title:', d['title'])
print('Description:', d['description'][:500])
print('Duration:', d['duration'])
"
Step 3: Get the thumbnail
Thumbnails are always public and reveal UI state, screen content, numbers.
# Loom: thumbnail URL is in oembed response
curl -s "https://www.loom.com/v1/oembed?url=LOOM_URL" | python3 -c "
import sys, json
d = json.load(sys.stdin)
print(d['thumbnail_url']) # animated GIF — download this
"
# YouTube: predictable URL
# https://img.youtube.com/vi/VIDEO_ID/maxresdefault.jpg
Download the thumbnail:
curl -s "THUMBNAIL_URL" -o /tmp/video_thumb.gif # or .jpg
Step 4: Analyze with Gemini
Use scripts/analyze_video.py — pass the thumbnail + all metadata.
python3 scripts/analyze_video.py \
--thumb /tmp/video_thumb.gif \
--title "Video title" \
--description "Auto-generated description text" \
--duration 143
Or call the Gemini API directly (see gemini-api.md).
Step 5: If disk space allows — download + full analysis
# Check disk first
df -h ~ | awk 'NR==2 {print $4}'
# Download with yt-dlp (works for Loom + YouTube + 1000+ sites)
yt-dlp "VIDEO_URL" -o /tmp/video.mp4
# Option A: Gemini video analysis (best for screen recordings)
python3 scripts/analyze_video.py --video /tmp/video.mp4
# Option B: Whisper transcription only (fast, audio-only)
curl -X POST https://api.openai.com/v1/audio/transcriptions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F "file=@/tmp/video.mp4" \
-F "model=whisper-1"
Output Format
Produce this structure every time:
## Video Analysis: [Title]
**URL:** [link] | **Duration:** [X:XX]
### What's on screen
[Vivid description of every visible UI element, number, text, state shown in the video/thumbnail. Be exhaustive — mention exact values, UI panel names, error messages, data shown.]
### What was said
[Transcript or paraphrase of the narration, with key quotes verbatim]
### Problem / Topic Summary
[1-2 sentences: what this video is about or demonstrating]
### Key Data Points
| Item | Value |
|------|-------|
| [exact numbers, names, states observed] | [value] |
### Action Items / Root Cause (if bug report)
[Specific things to investigate or do, based on the video content]
Platform Notes
See platforms.md for platform-specific quirks (Loom CDN auth, YouTube age-gate, etc.)
Disk Full Fallback
If disk is full (df -h shows < 500MB free):
- Use thumbnail + metadata only (Steps 1–4 above)
- Loom auto-descriptions are highly detailed — combine with thumbnail for good coverage
- Never fail silently — always produce a report from whatever data is available
More from blink-new/claude
saas-sidebar
Build a modern, collapsible sidebar for SaaS dashboards following the ChatGPT/Notion design pattern
75seo-article-writing
A comprehensive workflow for creating high-ranking SEO blog articles with keyword research, competitive analysis, AI-generated unique images, and optimized content structure
69pg-boss
Implement reliable PostgreSQL-based job queues with PG Boss. Use when implementing background jobs, scheduled tasks, cron-like functionality, task rollover, or email notifications in Node.js/TypeScript projects.
57kanban-dnd
Build world-class kanban board drag-and-drop with @dnd-kit. Linear-quality UX with proper collision detection, smooth animations, and visual feedback
57datafast
Accelerate adoption of DataFast analytics across any stack by codifying the installation, attribution, event, proxy, and API patterns that drive reliable conversion intelligence
54wysiwyg-editor
Build production-grade WYSIWYG editors using Tiptap v3 with proper markdown-style formatting, instant rendering, and bullet/numbered list support
51