YouTube Transcript Skill

Production-grade YouTube transcript extraction with comprehensive format support, intelligent caching, and resilient networking.

When to Use

✅ USE this skill when:

Extracting transcripts from YouTube videos
Converting YouTube captions to SRT/VTT subtitle files
Analyzing video content via transcripts
Creating subtitles for downloaded videos
Batch processing multiple video transcripts
Needing transcripts in specific languages
Processing auto-generated captions

❌ DON'T use this skill when:

Transcript not available (disabled by creator)
Video is private or age-restricted
Livestream that hasn't ended
Need speech-to-text from audio → Use transcribe
Need video frames → Use video-frames

Prerequisites

# Requires Node.js (already available)
node --version

# No additional dependencies required

Commands

Basic Usage

# Extract transcript with video ID
{baseDir}/youtube-transcript.js VIDEO_ID

# Extract with full URL
{baseDir}/youtube-transcript.js "https://www.youtube.com/watch?v=VIDEO_ID"

# Extract with short URL
{baseDir}/youtube-transcript.js "https://youtu.be/VIDEO_ID"

Output Formats

# Plain text with timestamps (default)
{baseDir}/youtube-transcript.js VIDEO_ID --format text
[0:00:00.00] Here is the transcript text
[0:00:05.32] More transcript content

# Plain text without timestamps
{baseDir}/youtube-transcript.js VIDEO_ID --format plain
Here is the transcript text More transcript content

# JSON with metadata
{baseDir}/youtube-transcript.js VIDEO_ID --format json
{
  "title": "Video Title",
  "author": "Channel Name",
  "language": "en",
  "isAutoGenerated": false,
  "transcript": [...]
}

# SRT subtitle format
{baseDir}/youtube-transcript.js VIDEO_ID --format srt > video.srt
1
00:00:00,000 --> 00:00:05,320
Here is the transcript text

2
00:00:05,320 --> 00:00:08,150
More transcript content

# VTT subtitle format
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > video.vtt
WEBVTT

1
00:00.000 --> 00:05.320
Here is the transcript text

# TSV tab-separated values
{baseDir}/youtube-transcript.js VIDEO_ID --format tsv
start\tduration\ttext
0.000\t5.320\tHere is the transcript text

# CSV comma-separated values
{baseDir}/youtube-transcript.js VIDEO_ID --format csv
start,duration,text
0.000,5.320,"Here is the transcript text"

Language Selection

# Auto-select best available (default)
{baseDir}/youtube-transcript.js VIDEO_ID

# Specific language by code
{baseDir}/youtube-transcript.js VIDEO_ID --language en
{baseDir}/youtube-transcript.js VIDEO_ID --language es
{baseDir}/youtube-transcript.js VIDEO_ID --language fr

# Partial matches work too
{baseDir}/youtube-transcript.js VIDEO_ID --language zh   # Matches zh-CN, zh-TW, etc.

# Language with auto-generated preference
{baseDir}/youtube-transcript.js VIDEO_ID --language ja --format srt

Common Language Codes:

Code	Language
en	English
es	Spanish
fr	French
de	German
ja	Japanese
ko	Korean
zh	Chinese
pt	Portuguese
ru	Russian
hi	Hindi
ar	Arabic
it	Italian

Save to File

# Save transcript directly to file
{baseDir}/youtube-transcript.js VIDEO_ID --output transcript.txt
{baseDir}/youtube-transcript.js VIDEO_ID --format srt --output subtitles.srt
{baseDir}/youtube-transcript.js VIDEO_ID --format json --output data.json

# Shell redirection (equivalent)
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > captions.vtt

Advanced Options

# Skip cache (force fresh fetch)
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache

# Verbose debugging output
DEBUG=1 {baseDir}/youtube-transcript.js VIDEO_ID

# Combine options
{baseDir}/youtube-transcript.js VIDEO_ID --language es --format srt --output spanish.srt --no-cache

Features

Format Comparison

Format	Use Case	Human Readable	Machine Readable
`text`	Default viewing	✅	⚠️
`plain`	Content only	✅	⚠️
`json`	API integration	⚠️	✅
`srt`	Subtitle files	✅	✅
`vtt`	Web captions	✅	✅
`tsv`	Spreadsheet import	⚠️	✅
`csv`	Database import	⚠️	✅

Supported Video URL Formats

# Plain video ID (11 characters)
EBw7gsDPAYQ

# Standard YouTube URL
https://www.youtube.com/watch?v=EBw7gsDPAYQ

# Short youtu.be URL
https://youtu.be/EBw7gsDPAYQ

# Embed URL
https://www.youtube.com/embed/EBw7gsDPAYQ

# YouTube Live URL
https://www.youtube.com/live/EBw7gsDPAYQ

# URLs with additional parameters (automatically handled)
https://www.youtube.com/watch?v=EBw7gsDPAYQ&t=120s
https://www.youtube.com/watch?v=EBw7gsDPAYQ&index=2

# Playlist URLs (extracts first video)
https://www.youtube.com/watch?v=EBw7gsDPAYQ&list=...

Intelligent Caching

The skill implements intelligent caching to improve performance:

Cache Location: /tmp/youtube-transcript-cache/
TTL: 24 hours per entry
Max Entries: 100 videos
Benefits:
- Instant retrieval of previously fetched transcripts
- Reduced load on YouTube servers
- Better performance for repeated operations

Cache Bypass:

# Force fresh fetch
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache

Rate Limiting

To avoid being blocked by YouTube:

Max 60 requests per minute
Minimum 1 second delay between requests
Exponential backoff on retries

Retry Logic

When requests fail:

First attempt
Wait 2 seconds, retry
Wait 4 seconds, retry
Wait 6 seconds, retry
Final error reported

Error Handling

Error Codes

Code	Name	Description	Resolution
0	SUCCESS	Transcript fetched	None needed
1	INVALID_VIDEO_ID	Bad URL/ID format double-check the video ID
2	VIDEO_NOT_FOUND	Video doesn't exist	Verify video exists
3	TRANSCRIPT_DISABLED	Creator disabled captions	Contact creator
4	NO_TRANSCRIPT	No captions available	Wait for transcript
5	VIDEO_UNAVAILABLE	Can't access	Check restrictions
6	PRIVATE_VIDEO	Video is private	Get access/permission
7	RATE_LIMITED	Too many requests	Wait before retry
8	NETWORK_ERROR	Connection issue	Check internet
9	PARSE_ERROR	Data extraction failed	Try again
99	UNKNOWN	Unexpected error	Report issue

Common Errors and Solutions

"Could not extract player data"

YouTube may have changed their page structure
The video may be age-restricted
The video may require login
Solution: Try again later or check if video is publicly accessible

"No captions available for this video"

Creator hasn't added captions
Auto-generated captions aren't ready (may take a few hours after upload)
Video is too new
Solution: Wait for YouTube to generate captions, or check if manual captions exist

"Rate limited by YouTube"

Too many requests in short period
Solution: Wait 1-2 minutes before retrying

"Transcript too long"

Video exceeds 500K characters
Solution: Use --format json which handles large transcripts better

"Video unavailable or not found"

Video removed or never existed
Region-restricted
Solution: Verify video ID/URL is correct

Technical Architecture

Data Flow

Video ID/URL
    ↓
Extract Video ID ← URL parser (7+ formats)
    ↓
Check Cache ← 24hr TTL store
    ↓[cache miss]
Fetch YouTube Page ← HTTP with retry logic
    ↓
Extract Player Data ← ytInitialPlayerResponse
    ↓
Parse Caption Tracks ← Language selection
    ↓
Fetch Transcript ← Select appropriate URL
    ↓
Parse Entries ← XML/JSON parsing
    ↓
Format Output ← 7 output formats
    ↓
Cache & Return ← Store for 24hr

Player Data Extraction

Extracts multiple potential sources:

ytInitialPlayerResponse JavaScript variable
playerResponse JSON in script tags
Caption tracks from various locations

Transcript Parsing

Supports multiple formats:

JSON API Response: Modern format
Timed Text XML: Legacy format
Alternative XML: Older structure
Special handling for: Auto-generated vs manual captions

Data Unescaping

Properly handles:

& → &
< → <
> → >
" → "
' / ' / ' → '
Whitespace normalization

Sample Output

JSON Format (Full)

{
  "title": "How Artificial Intelligence Works",
  "author": "Example Channel",
  "duration": "PT10M32S",
  "language": "en",
  "isAutoGenerated": true,
  "transcript": [
    {
      "start": 0.000,
      "duration": 5.320,
      "text": "In this video, we'll explore how AI systems learn and adapt"
    },
    {
      "start": 5.320,
      "duration": 4.180,
      "text": "to perform tasks that traditionally required human intelligence"
    }
  ],
  "word_count": 2847,
  "total_entries": 156
}

SRT Format (SubRip)

1
00:00:00,000 --> 00:00:05,320
In this video, we'll explore how AI systems
learn and adapt

2
00:00:05,320 --> 00:00:09,500
to perform tasks that traditionally
required human intelligence

3
00:00:09,500 --> 00:00:13,240
This process is called
machine learning

...

VTT Format (WebVTT)

WEBVTT

1
00:00.000 --> 00:05.320
In this video, we'll explore how AI systems
learn and adapt

2
00:05.320 --> 00:09.500
to perform tasks that traditionally
required human intelligence

...

Examples

Download Transcripts for Playlist

#!/bin/bash
# Process multiple videos from IDs file

for video_id in $(cat video_ids.txt); do
  echo "Processing: $video_id"
  
  {baseDir}/youtube-transcript.js "$video_id" --format srt --output "transcripts/${video_id}.srt" 2>/dev/null
  
  if [ $? -eq 0 ]; then
    echo "  ✓ Success"
  else
    echo "  ✗ Failed"
  fi
  
  # Sleep to respect rate limits
  sleep 2
done

Convert to PDF for Reading

#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"

# Get transcript
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format plain > transcript.txt

# Convert to PDF (requires pandoc)
pandoc transcript.txt -o transcript.pdf
echo "PDF created: transcript.pdf"

Analyze Word Counts

#!/bin/bash
VIDEO_ID="EBw7gsDPAYQ"

# Get JSON format
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format json | jq -r '
  "Title: \(.title)",
  "Author: \(.author)",
  "Words: \(.word_count)",
  "Entries: \(.total_entries)",
  "Language: \(.language)\(.isAutoGenerated ? " (auto)" : "")"
'

Batch Download with Progress

#!/bin/bash
VIDEOS=("VIDEO1" "VIDEO2" "VIDEO3")
TOTAL=${#VIDEOS[@]}

for i in "${!VIDEOS[@]}"; do
  id="${VIDEOS[$i]}"
  echo "[$((i+1))/$TOTAL] Processing $id..."
  
  {baseDir}/youtube-transcript.js "$id" --format json --output "data/${id}.json" 2>/dev/null
  
  sleep 1  # Rate limit protection
done

Create Bilingual Subtitles

#!/bin/bash
VIDEO_ID="your-video-id"

# Get English and Spanish
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language en --format srt > english.srt
echo "English ✓"

{baseDir}/youtube-transcript.js "$VIDEO_ID" --language es --format srt > spanish.srt
echo "Spanish ✓"

# Combine (requires ffmpeg)
ffmpeg -i video.mp4 -i english.srt -i spanish.srt \
  -map 0:v -map 0:a -map 1:s:0 -map 2:s:0 \
  -c:v copy -c:a copy -c:s mov_text \
  "${VIDEO_ID}_bilingual.mp4"

echo "Bilingual video created ✓"

Performance Tips

1. Use Caching

First fetch: ~2-5 seconds
Cached fetch: ~100ms

# First time (slow)
{baseDir}/youtube-transcript.js VIDEO_ID

# Second time (fast - from cache)
{baseDir}/youtube-transcript.js VIDEO_ID

# Force refresh (slow)
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache

2. Batch Processing with Delays

# Bad - might hit rate limits
for id in $IDS; do
  {baseDir}/youtube-transcript.js "$id"
done

# Good - respects rate limits
for id in $IDS; do
  {baseDir}/youtube-transcript.js "$id"
  sleep 2
done

3. Parallel Processing (Limited)

# Process 2-3 at a time (don't exceed rate limit)
{baseDir}/youtube-transcript.js VIDEO1 &
{baseDir}/youtube-transcript.js VIDEO2 &
{baseDir}/youtube-transcript.js VIDEO3 &
wait

4. Output Format Selection

Fastest: plain (smallest output, fastest write)
Recommended: text or json (balanced)
For subtitles: srt or vtt (industry standard)

Limitations

No Private Videos: Requires public access
No Age-Restricted: Some videos unavailable
No Members-Only: Requires YouTube membership
Livestream Lag: Captions may be delayed
New Videos: Auto-generated captions take time
Rate Limits: Max 60 requests/minute
Large Transcripts: Limited to 500K characters

Notes

Cached transcripts expire after 24 hours
Auto-generated captions may have errors
Manual captions are preferred when available
Language codes follow YouTube's internal format
SRT format uses comma for milliseconds (WebVTT uses period)
TSV and CSV formats are UTF-8 encoded
JSON output includes metadata for programmatic use
Script is network-resilient with automatic retries
Use --output to save directly to file (handles special characters)
STDERR contains progress messages and metadata
STDOUT contains the actual transcript data

youtube-transcript