youtube-captions

Installation

SKILL.md

youtube-captions

Extract timestamped captions from YouTube videos. Output is VTT format (timestamps preserved).

Prerequisites

yt-dlp — REQUIRED (brew install yt-dlp)
openai-whisper — REQUIRED only if video has no subtitles (pip install openai-whisper)
- First run downloads the small model (~500MB)
- Transcription is significantly slower than subtitle download

Usage

bash scripts/get-captions.sh <youtube-url> [language]

youtube-url — any valid YouTube video URL
language — subtitle language code (default: en)

Output goes to stdout. Status messages go to stderr.

Fallback Chain

Manual subtitles — human-uploaded captions (fastest, most accurate)
Auto-generated subtitles — YouTube's speech recognition
Whisper transcription — downloads audio, transcribes locally with whisper --model small

The script tries each step in order and exits on the first success.

Example

# Get English captions
bash scripts/get-captions.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Get Spanish captions
bash scripts/get-captions.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ" es

Output Format

VTT (Web Video Text Tracks) with timestamps:

WEBVTT

00:00:01.000 --> 00:00:04.000
First line of dialogue

00:00:04.000 --> 00:00:08.000
Second line of dialogue

Notes

All temporary files (audio, intermediate subtitle files) are cleaned up automatically
If neither yt-dlp subtitles nor Whisper are available, the script exits with an error and clear instructions
Long videos with no subtitles will take time to transcribe — Whisper processes roughly at 1x realtime on CPU

Related skills

More from third774/dotfiles

Installs

Repository

third774/dotfiles

GitHub Stars

First Seen

Mar 13, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn

youtube-captions

youtube-captions

Prerequisites

Usage

Fallback Chain

Example

Output Format

Notes

More from third774/dotfiles

opensrc

natural-writing

agent-skills

documenting-code-comments

customizing-opencode

adversarial-code-review