youtube-captions
SKILL.md
youtube-captions
Extract timestamped captions from YouTube videos. Output is VTT format (timestamps preserved).
Prerequisites
yt-dlp— REQUIRED (brew install yt-dlp)openai-whisper— REQUIRED only if video has no subtitles (pip install openai-whisper)- First run downloads the
smallmodel (~500MB) - Transcription is significantly slower than subtitle download
- First run downloads the
Usage
bash scripts/get-captions.sh <youtube-url> [language]
youtube-url— any valid YouTube video URLlanguage— subtitle language code (default:en)
Output goes to stdout. Status messages go to stderr.
Fallback Chain
- Manual subtitles — human-uploaded captions (fastest, most accurate)
- Auto-generated subtitles — YouTube's speech recognition
- Whisper transcription — downloads audio, transcribes locally with
whisper --model small
The script tries each step in order and exits on the first success.
Example
# Get English captions
bash scripts/get-captions.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# Get Spanish captions
bash scripts/get-captions.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ" es
Output Format
VTT (Web Video Text Tracks) with timestamps:
WEBVTT
00:00:01.000 --> 00:00:04.000
First line of dialogue
00:00:04.000 --> 00:00:08.000
Second line of dialogue
Notes
- All temporary files (audio, intermediate subtitle files) are cleaned up automatically
- If neither yt-dlp subtitles nor Whisper are available, the script exits with an error and clear instructions
- Long videos with no subtitles will take time to transcribe — Whisper processes roughly at 1x realtime on CPU
Weekly Installs
1
Repository
third774/dotfilesGitHub Stars
3
First Seen
4 days ago
Security Audits
Installed on
windsurf1
amp1
cline1
opencode1
cursor1
kimi-cli1