byted-voice-to-text

Pass

Audited by Gen Agent Trust Hub on Mar 27, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection. In SKILL.md, it instructs the AI agent to treat the output of transcription scripts as direct user messages ('将脚本输出的文字当作用户发送的文本消息'). This allows spoken instructions within an audio file (e.g., 'ignore previous instructions') to potentially influence agent behavior. The instructions lack boundary markers or explicit sanitization requirements for this untrusted content.
  • [Capability Inventory]:
  • Ingestion points: Audio files from local paths, public URLs, or Feishu (processed in asr_flash.py and asr_standard.py).
  • Boundary markers: Absent. Transcription text is interpolated directly into the agent context.
  • Capability inventory: Shell command execution (subprocess in ensure_ffmpeg.py, inspect_audio.py), network requests (requests in asr_*.py), and file write access (.env storage in api_key.py).
  • Sanitization: Absent.
  • [COMMAND_EXECUTION]: The script ensure_ffmpeg.py automates the installation of the ffmpeg and ffprobe dependencies. It constructs and executes system package manager commands (such as apt-get, dnf, yum, zypper, brew, winget, and choco). On Linux platforms, it programmatically attempts to use sudo to obtain elevated privileges for these installations.
  • [COMMAND_EXECUTION]: The script inspect_audio.py uses the subprocess module to execute ffprobe (and afinfo on macOS) on file paths or URLs provided by the user to extract media metadata. While arguments are passed as a list, the utility is executed on arbitrary external input.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 27, 2026, 07:55 AM