youtube-transcript
SKILL.md
YouTube Transcript Downloader
Extract transcripts from YouTube videos using yt-dlp. Supports manual subtitles, auto-generated captions, and Whisper transcription as a last resort.
Prerequisites
This skill requires UV for dependency management. Run from the tapestry-skills project root.
Workflow
URL → Validate → Check yt-dlp → List Subtitles → Download → Convert to Text → Save
Priority order:
- Manual subtitles (highest quality)
- Auto-generated captions (usually available)
- Whisper transcription (requires user consent)
Security Requirements
All security utilities are available via UV from the project root.
URL Validation
# Validate YouTube URL (must also be a valid YouTube domain)
if [[ ! "$URL" =~ ^https?://(www\.)?(youtube\.com|youtu\.be)/ ]]; then
echo "Error: Not a valid YouTube URL"
exit 1
fi
# Run general security validation (SSRF protection, etc.)
uv run tapestry-validate-url "$URL" || exit 1
Filename Sanitization
# Use tapestry sanitization utility
SAFE_TITLE=$(uv run tapestry-sanitize-filename "$VIDEO_TITLE")
Step 1: Check yt-dlp Installation
yt-dlp is included in the project dependencies. Use via UV:
# yt-dlp is available through uv run
uv run yt-dlp --version
Step 2: Check Available Subtitles
Always check what's available first:
uv run yt-dlp --list-subs "$URL"
Look for:
Available subtitles(manual, higher quality)Available automatic captions(auto-generated)
Step 3: Download Subtitles
Try Manual First
TEMP_DIR=$(mktemp -d)
trap "rm -rf '$TEMP_DIR'" EXIT
if uv run yt-dlp --write-sub --skip-download --sub-langs en -o "$TEMP_DIR/transcript" "$URL" 2>/dev/null; then
echo "Downloaded manual subtitles"
else
# Fall back to auto-generated
uv run yt-dlp --write-auto-sub --skip-download --sub-langs en -o "$TEMP_DIR/transcript" "$URL"
echo "Downloaded auto-generated captions"
fi
Step 4: Convert VTT to Clean Text
Use the tapestry VTT converter which handles deduplication automatically:
VIDEO_TITLE=$(uv run yt-dlp --print "%(title)s" "$URL" 2>/dev/null)
SAFE_TITLE=$(uv run tapestry-sanitize-filename "$VIDEO_TITLE")
VTT_FILE=$(find "$TEMP_DIR" -name "*.vtt" | head -n 1)
# Convert VTT to clean text (handles deduplication, HTML tags, entities)
uv run tapestry-vtt-to-text "$VTT_FILE" --output "${SAFE_TITLE}.txt"
echo "Saved: ${SAFE_TITLE}.txt"
Whisper Transcription (Last Resort)
Only offer when no subtitles are available. Requires explicit user consent.
Check File Size First
DURATION=$(uv run yt-dlp --print "%(duration)s" "$URL" 2>/dev/null)
TITLE=$(uv run yt-dlp --print "%(title)s" "$URL" 2>/dev/null)
echo "No subtitles available for: $TITLE"
echo "Duration: $((DURATION / 60)) minutes"
echo ""
echo "Download audio and transcribe with Whisper? (requires ~1-3GB for model)"
echo "Type 'yes' to proceed:"
Wait for explicit "yes" before proceeding.
Download and Transcribe
TEMP_DIR=$(mktemp -d)
trap "rm -rf '$TEMP_DIR'" EXIT
# Download audio only
uv run yt-dlp -x --audio-format mp3 -o "$TEMP_DIR/audio.%(ext)s" "$URL"
# Transcribe with Whisper (installs on first use)
uv run --with openai-whisper whisper "$TEMP_DIR/audio.mp3" --model base --output_format vtt --output_dir "$TEMP_DIR"
# Convert VTT to text
VTT_FILE=$(find "$TEMP_DIR" -name "*.vtt" | head -n 1)
uv run tapestry-vtt-to-text "$VTT_FILE" --output "${SAFE_TITLE}.txt"
Complete Workflow Script
#!/bin/bash
set -e
URL="$1"
# Validate YouTube URL
if [[ ! "$URL" =~ ^https?://(www\.)?(youtube\.com|youtu\.be)/ ]]; then
echo "Error: Not a valid YouTube URL"
exit 1
fi
# Run security validation
uv run tapestry-validate-url "$URL" || exit 1
# Get video info
VIDEO_TITLE=$(uv run yt-dlp --print "%(title)s" "$URL" 2>/dev/null)
SAFE_TITLE=$(uv run tapestry-sanitize-filename "$VIDEO_TITLE")
echo "Video: $VIDEO_TITLE"
echo ""
# Create temp directory
TEMP_DIR=$(mktemp -d)
trap "rm -rf '$TEMP_DIR'" EXIT
# Try to download subtitles
echo "Checking for subtitles..."
if uv run yt-dlp --write-sub --skip-download --sub-langs en -o "$TEMP_DIR/transcript" "$URL" 2>/dev/null; then
echo "Found manual subtitles"
elif uv run yt-dlp --write-auto-sub --skip-download --sub-langs en -o "$TEMP_DIR/transcript" "$URL" 2>/dev/null; then
echo "Found auto-generated captions"
else
echo "No subtitles available"
# Offer Whisper option here (with user consent)
exit 1
fi
# Find VTT file
VTT_FILE=$(find "$TEMP_DIR" -name "*.vtt" | head -n 1)
if [ -z "$VTT_FILE" ]; then
echo "Error: No VTT file found"
exit 1
fi
# Convert to clean text
uv run tapestry-vtt-to-text "$VTT_FILE" --output "${SAFE_TITLE}.txt"
WORD_COUNT=$(wc -w < "${SAFE_TITLE}.txt" | tr -d ' ')
echo ""
echo "Transcript saved: ${SAFE_TITLE}.txt"
echo "Words: $WORD_COUNT"
Error Handling
| Issue | Solution |
|---|---|
| UV not installed | Install with curl -LsSf https://astral.sh/uv/install.sh | sh |
| Invalid URL | Reject non-YouTube URLs |
| No subtitles | Offer Whisper with user consent |
| Private/restricted video | Inform user, cannot access |
| SSL errors | Try --no-check-certificate (with warning) |
| Download timeout | Retry or inform user |
Dependencies
All dependencies are managed via UV and pyproject.toml:
- yt-dlp: YouTube downloads (pinned version)
- openai-whisper (optional): For videos without subtitles
Security Reference
For complete security guidelines: ../shared/references/security-guidelines.md
Weekly Installs
5
Repository
ryanhudson/tape…ude-codeGitHub Stars
1
First Seen
Jan 23, 2026
Security Audits
Installed on
claude-code4
trae3
gemini-cli3
antigravity3
opencode3
windsurf2