voice-clone
Installation
SKILL.md
Voice Clone Skill
Use this skill to clone a speaker's voice and generate text-to-speech audio.
Two-Step Process
Step 1: Clone Voice (one-time)
python skills/voice-clone/clone.py <audio_sample.wav> [--transcript "text"]
Creates a speaker embedding file that can be reused.
Step 2: Generate Speech
python skills/voice-clone/speak.py <embedding.safetensors> "Text to speak"
Generates audio using the cloned voice.
Requirements
- FAL_KEY in .env (fal.ai API key)
- Voice sample: 10-30 seconds of clear speech (WAV/MP3)
- Optional: Transcript of the sample for better quality
Output
assets/outputs/voice_embeddings/<name>_embedding.safetensors- Reusable voice modelassets/outputs/audio/<name>_speech.wav- Generated audio
Notes
- qwen3-tts works best with Chinese speech samples
- Cross-lingual cloning (Chinese voice → English speech) may have quality variations
- Provide reference transcript for best quality