The Agent Skills Directory

[COMMAND_EXECUTION]: The script uses spawnSync to run yt-dlp, bun, or python3. This is used to extract video data when the direct API path fails. The commands are constructed using predefined arguments and video IDs.
[EXTERNAL_DOWNLOADS]: The skill fetches video metadata, transcripts, and thumbnails from YouTube domains (youtube.com, ytimg.com). These are standard operations for a transcription tool.
[REMOTE_CODE_EXECUTION]: When falling back to yt-dlp, the skill includes the --remote-components ejs:github flag. This flag directs yt-dlp to load components from a remote source, which is a form of dynamic code loading.
[PROMPT_INJECTION]: The skill processes untrusted text from YouTube transcripts and video descriptions. This creates a surface for indirect prompt injection where an attacker could place instructions in a video's captions to influence the AI's behavior during the speaker identification step.
Ingestion points: Video descriptions and transcripts are loaded in scripts/main.ts and scripts/youtube.ts.
Boundary markers: Instructions for post-processing are defined in prompts/speaker-transcript.md, but there is no strong isolation between the instructions and the untrusted transcript data.
Capability inventory: The agent can perform file system writes and execute shell commands (via the transcription scripts).
Sanitization: The skill implements basic tag stripping and character unescaping, but does not filter for logical or instructional content in the transcript text.

baoyu-youtube-transcript