The Agent Skills Directory

[EXTERNAL_DOWNLOADS]: Fetches pre-trained machine learning model weights from HuggingFace repositories (e.g., mlx-community/Qwen3-TTS-12Hz-0.6B-Base-bf16), which is an established and well-known service.
[COMMAND_EXECUTION]: Utilizes shell commands for ffmpeg and yt-dlp to perform audio format conversion and training data acquisition as part of the core pipeline functionality.
[DATA_EXFILTRATION]: Sends generated and reference audio samples to Google's Gemini API for speaker similarity and naturalness assessment during the evaluation process. This operation is documented and requires a user-provided API key.
[PROMPT_INJECTION]: The skill's evaluation component processes untrusted audio files and transcripts, creating an attack surface for indirect prompt injection.
Ingestion points: Untrusted audio files and transcriptions enter the agent context via split.py and prepare_data.py.
Boundary markers: None identified; external content is interpolated directly into prompts for the Gemini judge.
Capability inventory: Includes shell command execution (ffmpeg, yt-dlp) and network communication (Gemini API, HuggingFace downloads).
Sanitization: No escaping, validation, or filtering of audio content or transcript text is performed before evaluation.

qwen-tts-voice-cloning