clip-hand-skill
Pass
Audited by Gen Agent Trust Hub on Mar 12, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill relies heavily on the
shell_exectool to perform its core functions, executing binaries such asffmpeg,ffprobe,yt-dlp, andwhisper. This operational model grants the agent broad access to the underlying system's command line. - [EXTERNAL_DOWNLOADS]: The skill is designed to fetch video content and metadata from arbitrary remote URLs via the
yt-dlputility, introducing risks associated with processing untrusted media files. - [DATA_EXFILTRATION]: The skill processes and transmits data to several external APIs (Groq, OpenAI, Deepgram, ElevenLabs, Telegram, and WhatsApp) using
curl. This involves the use of sensitive API keys and the uploading of audio/video content to third-party servers. - [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it analyzes untrusted text data (transcripts and metadata) retrieved from the internet to make processing decisions.
- Ingestion points: Untrusted data enters the agent context through
yt-dlp --dump-json(metadata) and various transcription outputs (Whisper/YouTube auto-subs) as defined in theHAND.tomlsystem prompt. - Boundary markers: The system prompt does not include clear delimiters or instructions to treat external transcript content as non-authoritative data, nor does it warn the agent to ignore instructions embedded within the text.
- Capability inventory: The skill has high-privilege capabilities including arbitrary binary execution via
shell_exec, as well asfile_writeandfile_readpermissions. - Sanitization: There is no logic provided to sanitize or filter the content of transcripts or video metadata before the agent uses them to determine clip segments or labels.
Audit Metadata