qwen-tts-voice-cloning

Pass

Audited by Gen Agent Trust Hub on Mar 10, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: Fetches pre-trained machine learning model weights from HuggingFace repositories (e.g., mlx-community/Qwen3-TTS-12Hz-0.6B-Base-bf16), which is an established and well-known service.
  • [COMMAND_EXECUTION]: Utilizes shell commands for ffmpeg and yt-dlp to perform audio format conversion and training data acquisition as part of the core pipeline functionality.
  • [DATA_EXFILTRATION]: Sends generated and reference audio samples to Google's Gemini API for speaker similarity and naturalness assessment during the evaluation process. This operation is documented and requires a user-provided API key.
  • [PROMPT_INJECTION]: The skill's evaluation component processes untrusted audio files and transcripts, creating an attack surface for indirect prompt injection.
  • Ingestion points: Untrusted audio files and transcriptions enter the agent context via split.py and prepare_data.py.
  • Boundary markers: None identified; external content is interpolated directly into prompts for the Gemini judge.
  • Capability inventory: Includes shell command execution (ffmpeg, yt-dlp) and network communication (Gemini API, HuggingFace downloads).
  • Sanitization: No escaping, validation, or filtering of audio content or transcript text is performed before evaluation.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 10, 2026, 12:31 PM