speech-to-text
Fail
Audited by Gen Agent Trust Hub on May 1, 2026
Risk Level: HIGHCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The script
scripts/transcribe.jsuseschild_process.execSyncto runffmpegfor audio extraction from video files. The file path argument is interpolated into the shell command string using${filePath}inside double quotes. This construct is vulnerable to command injection because shells typically evaluate command substitutions (like$(command)) and backticks even within double quotes. An attacker providing a malicious filename could execute arbitrary shell commands with the same privileges as the agent. \n- [DATA_EXFILTRATION]: The skill's core functionality involves reading local files (node:fs/promises.readFile) and sending their contents to an external API (Soniox). While expected for a speech-to-text tool, this pattern can be abused to exfiltrate sensitive files if the agent is tricked into transcribing non-media files such as.ssh/id_rsaor.env. \n- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection. \n - Ingestion points: Data from external audio/video files is processed and returned as text in
scripts/transcribe.js. \n - Boundary markers: Absent. The transcription result is displayed or saved without delimiters or warnings to ignore embedded instructions. \n
- Capability inventory: Subprocess execution (
execSyncinscripts/transcribe.js), file system access (readFile,writeFile,unlink), and network communication with the Soniox API. \n - Sanitization: Absent. The text received from the transcription API is used directly without filtering or escaping.
Recommendations
- AI detected serious security threats
Audit Metadata