The Agent Skills Directory

[COMMAND_EXECUTION]: The skill relies heavily on the shell_exec tool to perform its core functions, executing binaries such as ffmpeg, ffprobe, yt-dlp, and whisper. This operational model grants the agent broad access to the underlying system's command line.
[EXTERNAL_DOWNLOADS]: The skill is designed to fetch video content and metadata from arbitrary remote URLs via the yt-dlp utility, introducing risks associated with processing untrusted media files.
[DATA_EXFILTRATION]: The skill processes and transmits data to several external APIs (Groq, OpenAI, Deepgram, ElevenLabs, Telegram, and WhatsApp) using curl. This involves the use of sensitive API keys and the uploading of audio/video content to third-party servers.
[PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it analyzes untrusted text data (transcripts and metadata) retrieved from the internet to make processing decisions.
Ingestion points: Untrusted data enters the agent context through yt-dlp --dump-json (metadata) and various transcription outputs (Whisper/YouTube auto-subs) as defined in the HAND.toml system prompt.
Boundary markers: The system prompt does not include clear delimiters or instructions to treat external transcript content as non-authoritative data, nor does it warn the agent to ignore instructions embedded within the text.
Capability inventory: The skill has high-privilege capabilities including arbitrary binary execution via shell_exec, as well as file_write and file_read permissions.
Sanitization: There is no logic provided to sanitize or filter the content of transcripts or video metadata before the agent uses them to determine clip segments or labels.

clip-hand-skill