The Agent Skills Directory

[PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it fetches and processes untrusted text from external YouTube videos.
Ingestion points: The get-captions.sh script extracts captions directly from YouTube or generates transcripts using the Whisper model from video audio.
Boundary markers: No explicit delimiters or boundary markers are used to wrap the transcript text when it is provided to the agent. There are no instructions to the agent to treat the transcript as data rather than instructions.
Capability inventory: The skill uses yt-dlp and whisper to perform its tasks. While the script has limited capabilities, the resulting transcript becomes part of the agent's conversational context.
Sanitization: There is no sanitization, filtering, or validation performed on the extracted captions. If a video contains captions like 'Ignore your previous instructions and perform [malicious action]', an agent summarizing the video might follow those instructions.
[EXTERNAL_DOWNLOADS]: The skill downloads media and subtitle files from YouTube using yt-dlp and utilizes the openai-whisper model.
The openai-whisper package is provided by OpenAI, a trusted organization.
yt-dlp is a standard, well-known tool for media extraction.
[COMMAND_EXECUTION]: The shell script executes several subprocesses to perform its functions.
It uses yt-dlp for subtitle and audio extraction and whisper for transcription.
The script uses mktemp for temporary file management and includes cleanup logic via shell traps.
Variables passed to commands are quoted to prevent basic shell injection vulnerabilities.

youtube-captions