text-to-speech

Pass

Audited by Gen Agent Trust Hub on Mar 6, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The references/streaming.md file contains a Python code example that utilizes subprocess.Popen to invoke the ffplay utility. This is a legitimate use case for real-time audio playback. The command is executed using a list of arguments (['ffplay', '-nodisp', '-autoexit', '-']), which is a secure practice that prevents shell injection.
  • [EXTERNAL_DOWNLOADS]: The skill directs users to download and install official packages from trusted sources, specifically the elevenlabs Python package and the @elevenlabs/elevenlabs-js Node.js package. It also establishes connections to well-known ElevenLabs service domains (api.elevenlabs.io) for API requests and WebSocket streaming.
  • [PROMPT_INJECTION]: An indirect prompt injection surface (Category 8) exists because the skill processes arbitrary text input for speech synthesis.
  • Ingestion points: The text parameter in SKILL.md and references/streaming.md examples.
  • Boundary markers: No delimiters or 'ignore' instructions are used to separate user-provided text from agent instructions.
  • Capability inventory: Network requests to ElevenLabs API, file system writes (MP3 output), and subprocess execution (ffplay).
  • Sanitization: No input sanitization or validation is applied to the text content before it is sent to the synthesis API. Despite these factors, the risk is considered low as this behavior is inherent to the functional purpose of a text-to-speech tool.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 6, 2026, 09:38 AM