speech-to-text

Fail

Audited by Gen Agent Trust Hub on May 1, 2026

Risk Level: HIGHCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The script scripts/transcribe.js uses child_process.execSync to run ffmpeg for audio extraction from video files. The file path argument is interpolated into the shell command string using ${filePath} inside double quotes. This construct is vulnerable to command injection because shells typically evaluate command substitutions (like $(command)) and backticks even within double quotes. An attacker providing a malicious filename could execute arbitrary shell commands with the same privileges as the agent. \n- [DATA_EXFILTRATION]: The skill's core functionality involves reading local files (node:fs/promises.readFile) and sending their contents to an external API (Soniox). While expected for a speech-to-text tool, this pattern can be abused to exfiltrate sensitive files if the agent is tricked into transcribing non-media files such as .ssh/id_rsa or .env. \n- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection. \n
  • Ingestion points: Data from external audio/video files is processed and returned as text in scripts/transcribe.js. \n
  • Boundary markers: Absent. The transcription result is displayed or saved without delimiters or warnings to ignore embedded instructions. \n
  • Capability inventory: Subprocess execution (execSync in scripts/transcribe.js), file system access (readFile, writeFile, unlink), and network communication with the Soniox API. \n
  • Sanitization: Absent. The text received from the transcription API is used directly without filtering or escaping.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
May 1, 2026, 03:03 PM