Voice Clone Skill

A self-initializing, zero-configuration voice cloning skill. It manages a background TTS daemon that keeps heavy model weights in memory for fast inference. Supports multiple engines and unlimited text length.

Quick reference

Item	Value
Entry script	`bash scripts/run_tts.sh --text "..." --ref_audio "..." [--speed 1.0] [--output_dir "..."]`
Output	Single line: absolute path to generated `.ogg` file
Attachment format	`MEDIA:<output_path>`
Default engine	F5-TTS (env `TTS_BACKEND=f5`)
Host/Port config	`.env` (`TTS_SERVER_HOST`, `TTS_SERVER_PORT`)

When to use this skill

The user sends a voice memo or audio file and you need to reply with audio.
The user says "read this aloud", "speak to me", "use my voice", "voice mode".
The conversation context implies a spoken reply is expected.

voice-clone-bot

Voice Clone Skill

Quick reference

When to use this skill