byted-voice-to-text
Pass
Audited by Gen Agent Trust Hub on Mar 27, 2026
Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection. In SKILL.md, it instructs the AI agent to treat the output of transcription scripts as direct user messages ('将脚本输出的文字当作用户发送的文本消息'). This allows spoken instructions within an audio file (e.g., 'ignore previous instructions') to potentially influence agent behavior. The instructions lack boundary markers or explicit sanitization requirements for this untrusted content.
- [Capability Inventory]:
- Ingestion points: Audio files from local paths, public URLs, or Feishu (processed in asr_flash.py and asr_standard.py).
- Boundary markers: Absent. Transcription text is interpolated directly into the agent context.
- Capability inventory: Shell command execution (subprocess in ensure_ffmpeg.py, inspect_audio.py), network requests (requests in asr_*.py), and file write access (.env storage in api_key.py).
- Sanitization: Absent.
- [COMMAND_EXECUTION]: The script
ensure_ffmpeg.pyautomates the installation of theffmpegandffprobedependencies. It constructs and executes system package manager commands (such asapt-get,dnf,yum,zypper,brew,winget, andchoco). On Linux platforms, it programmatically attempts to usesudoto obtain elevated privileges for these installations. - [COMMAND_EXECUTION]: The script
inspect_audio.pyuses thesubprocessmodule to executeffprobe(andafinfoon macOS) on file paths or URLs provided by the user to extract media metadata. While arguments are passed as a list, the utility is executed on arbitrary external input.
Audit Metadata