transcribe

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [Indirect Prompt Injection] (HIGH): The script scripts/transcribe_diarize.py accepts arbitrary file paths via the audio and --known-speaker arguments without path validation or sandboxing.
  • Ingestion points: The audio positional argument and the --known-speaker flag in scripts/transcribe_diarize.py take user-provided strings directly into file system operations.
  • Boundary markers: Absent. There are no delimiters or instructions to prevent the agent from being coerced into reading non-audio files.
  • Capability inventory: The script performs file reads (Path.read_bytes, open("rb")), network transmissions (OpenAI API call), and file writes (Path.write_text).
  • Sanitization: Absent. The script relies on mimetypes.guess_type and existence checks, which do not prevent reading sensitive text-based files if they are given an audio-like extension or if the API accepts the raw bytes.
  • [Data Exfiltration] (HIGH): Sensitive local file content can be exfiltrated to the OpenAI API service if an attacker influences the agent's input parameters.
  • Evidence: The function _encode_data_url in scripts/transcribe_diarize.py reads the entire content of a file provided via the --known-speaker argument and base64 encodes it into the API request payload.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 08:05 AM