voice-clone-bot
SKILL.md
Voice Clone Skill
A self-initializing, zero-configuration voice cloning skill. It manages a background TTS daemon that keeps heavy model weights in memory for fast inference. Supports multiple engines and unlimited text length.
Quick reference
| Item | Value |
|---|---|
| Entry script | bash scripts/run_tts.sh --text "..." --ref_audio "..." [--speed 1.0] [--output_dir "..."] |
| Output | Single line: absolute path to generated .ogg file |
| Attachment format | MEDIA:<output_path> |
| Default engine | F5-TTS (env TTS_BACKEND=f5) |
| Host/Port config | .env (TTS_SERVER_HOST, TTS_SERVER_PORT) |
When to use this skill
- The user sends a voice memo or audio file and you need to reply with audio.
- The user says "read this aloud", "speak to me", "use my voice", "voice mode".
- The conversation context implies a spoken reply is expected.