YouTube Knowledge Learner

Use this skill to turn a YouTube video into task-ready knowledge:

Acquire a transcript or captions.
Preserve source metadata and timestamped references.
Distill the video into a reusable Markdown reference.
Apply the learned knowledge to the user's current task.

The goal is not a generic summary. The output should help a future agent quickly reuse the video as reliable context without watching it again.

Inputs

Accept any of these:

A YouTube URL.
A local audio/video file.
A transcript file supplied by the user.
A YouTube URL plus a task, such as "watch this and implement the approach".

If the user gives both a video and a task, produce the reference note first, then use that note as context for the task.

Default Output

Create a Markdown note under a sensible project-local folder, usually:

knowledge/youtube/<slug>.md

If the project already has a docs, notes, research, or knowledge folder, follow that convention instead.

Use this structure:

# <Video title>

Source: <URL or file path>
Channel/author: <name if known>
Captured: <YYYY-MM-DD>
Transcript source: <official captions | auto captions | local transcription | user-provided transcript>
Task context: <what the user wanted to do with this video, or "reference only">

## Executive Summary
<5-10 bullets with the video's main claims and practical conclusions.>

## Core Concepts
<Concept-by-concept explanation, preserving definitions, assumptions, and constraints.>

## Procedures And Workflows
<Step-by-step methods taught in the video. Include timestamps when available.>

## Decisions, Tradeoffs, And Warnings
<What to do, what not to do, when advice changes, limitations, caveats.>

## Task-Relevant Notes
<Only include this when the user gave a task. Explain how the knowledge applies to that task.>

## Timestamp Index
- <00:00> <topic>
- <03:12> <topic>

## Reusable Prompt Context
<A compact block a future agent can paste into a prompt to reuse the knowledge.>

## Open Questions
<Unclear, missing, or unverifiable points.>

Do not paste a full copyrighted transcript into the final response. For local project files, store raw transcripts only when needed for processing or when the user explicitly asked for a transcript artifact. The reference note should mostly be paraphrased synthesis with short timestamped excerpts only where they are essential.

Transcript Acquisition

Prefer sources in this order:

Official/manual captions from YouTube.
Auto-generated captions from YouTube.
User-provided transcript.
Local transcription from downloaded audio using the project's existing transcription tool, Whisper, OpenAI audio transcription, or another already-approved tool.

Do not add new dependencies unless the user explicitly asks. If tools are missing, explain the missing tool and use the next available path.

Helper Script

This skill includes scripts/fetch_youtube_knowledge.py, which can fetch captions through yt-dlp and create a starter Markdown note plus a transcript text file.

Example:

python <skill-dir>/scripts/fetch_youtube_knowledge.py "https://www.youtube.com/watch?v=VIDEO_ID" --out knowledge/youtube --task "Use this to implement the feature"

The helper is intentionally conservative:

It uses yt-dlp --dump-single-json for metadata and caption URLs.
It prefers manual captions over auto captions.
It writes a transcript .txt for agent processing.
It writes a Markdown scaffold that the agent must complete with real synthesis.

After running it, read the transcript and replace scaffold placeholders with actual learned content.

Learning Workflow

Identify the user's goal:
- Reference note only.
- Apply the video to a coding/design/research task.
- Produce both transcript and knowledge note.
Acquire transcript and metadata.
Segment the transcript by timestamps and topic shifts.
Extract durable knowledge:
- Definitions and terms.
- Claims and evidence.
- Procedures, commands, formulas, code patterns, settings.
- Constraints, prerequisites, warnings, and failure modes.
- Examples and edge cases.
Write the Markdown reference note.
Use the note to work on the task if the user asked for task execution.
Report the created file path and any unresolved gaps.

Quality Bar

The note is complete when a future agent can answer:

What did the video teach?
Which steps should I follow?
Which assumptions and caveats matter?
Where in the video did each important point appear?
How does this knowledge affect the user's task?

Use timestamped citations for important claims whenever timestamps are available. If a transcript has no timestamps, say so in the note.

Applying The Knowledge

When the user gives a downstream task, treat the note as a source artifact:

Read the relevant sections before editing code or producing a plan.
Cite the note path and timestamped source points in your reasoning when useful.
Prefer concrete procedures from the video over vague summary.
If the video advice conflicts with repository conventions, follow the repository unless the user explicitly wants the video's approach.

Failure Handling

If captions cannot be fetched:

Check whether yt-dlp is installed and whether the URL is accessible.
Try another caption language if the user allows it.
Ask the user for a transcript only after tool-based paths are exhausted.
If the video has no captions and no transcription tool is available, create a short blocked note documenting the URL, attempted commands, missing tools, and the next required input.

If the transcript is long:

Work in chunks.
Maintain a running topic map with timestamps.
Synthesize after reading all chunks so the note reflects the whole video, not only the beginning.

youtube-knowledge-learner