audio-to-text

Pass

Audited by Gen Agent Trust Hub on Feb 19, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [EXTERNAL_DOWNLOADS] (LOW): The skill downloads Whisper model weights from Hugging Face (mlx-community) at runtime. Although Hugging Face is a trusted organization, downloading remote binary assets is a necessary but noteworthy behavior.\n- [COMMAND_EXECUTION] (SAFE): System calls to ffmpeg and ffprobe are performed using secure subprocess.run implementations with list-based arguments, which prevents shell injection vulnerabilities.\n- [PROMPT_INJECTION] (LOW): The skill is susceptible to Indirect Prompt Injection (Category 8) via processed audio content. Malicious spoken commands in an audio file could be transcribed and subsequently interpreted by the agent as instructions.\n
  • Ingestion points: scripts/transcribe.py and scripts/benchmark.py (input audio files)\n
  • Boundary markers: Absent. Transcripts are generated without delimiters or warnings to the LLM to ignore embedded content.\n
  • Capability inventory: File system writes for transcripts and progress tracking, and subprocess execution for media processing.\n
  • Sanitization: None. The transcription process literally converts all detected speech to text.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 19, 2026, 02:46 AM