qwen-tts-voice-cloning
Pass
Audited by Gen Agent Trust Hub on Mar 10, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: Fetches pre-trained machine learning model weights from HuggingFace repositories (e.g.,
mlx-community/Qwen3-TTS-12Hz-0.6B-Base-bf16), which is an established and well-known service. - [COMMAND_EXECUTION]: Utilizes shell commands for
ffmpegandyt-dlpto perform audio format conversion and training data acquisition as part of the core pipeline functionality. - [DATA_EXFILTRATION]: Sends generated and reference audio samples to Google's Gemini API for speaker similarity and naturalness assessment during the evaluation process. This operation is documented and requires a user-provided API key.
- [PROMPT_INJECTION]: The skill's evaluation component processes untrusted audio files and transcripts, creating an attack surface for indirect prompt injection.
- Ingestion points: Untrusted audio files and transcriptions enter the agent context via
split.pyandprepare_data.py. - Boundary markers: None identified; external content is interpolated directly into prompts for the Gemini judge.
- Capability inventory: Includes shell command execution (
ffmpeg,yt-dlp) and network communication (Gemini API, HuggingFace downloads). - Sanitization: No escaping, validation, or filtering of audio content or transcript text is performed before evaluation.
Audit Metadata