Voice Cloning

Generate speech and clone voices locally without API costs.

When to Use

Need text-to-speech without paying for ElevenLabs/OpenAI
Want to clone a voice from a sample
Creating podcasts, voiceovers, or audio content
Privacy-sensitive applications (no data leaves your machine)

Quick Start

Option 1: Coqui TTS (Best Quality)

# Install
pip install TTS

# List available models
tts --list_models

# Generate speech
tts --text "Hello, this is a test." --out_path output.wav

# Use specific model (recommended: XTTS v2)
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
    --text "Hello world" \
    --out_path output.wav

Option 2: Bark (Most Natural)

# Install
pip install git+https://github.com/suno-ai/bark.git

# Use via Python
python skills/voice-cloning/scripts/bark-generate.py "Your text here" output.wav

Option 3: Piper (Fastest)

# Install
pip install piper-tts

# Generate (very fast, good for bulk)
echo "Hello world" | piper --model en_US-lessac-medium --output_file output.wav

Voice Cloning (XTTS v2)

Clone any voice from a 6+ second audio sample:

python skills/voice-cloning/scripts/clone-voice.py \
    --sample voice_sample.wav \
    --text "Text to speak in cloned voice" \
    --output cloned_output.wav

Available Scripts

`scripts/coqui-generate.py`

Basic TTS generation with Coqui.

`scripts/bark-generate.py`

Natural-sounding speech with Bark (slower but more expressive).

`scripts/clone-voice.py`

Clone a voice from an audio sample using XTTS v2.

`scripts/batch-tts.py`

Generate multiple audio files from a text file (one line = one file).

Model Comparison

Model	Quality	Speed	Voice Clone	Languages
XTTS v2	★★★★★	Slow	✅ Yes	16
Bark	★★★★★	Very Slow	❌ No	EN mainly
Piper	★★★☆☆	Very Fast	❌ No	30+

Tips

For quality: Use XTTS v2 or Bark
For speed: Use Piper
For cloning: XTTS v2 is your only free option
GPU recommended: Bark and XTTS are slow on CPU

Limitations

First run downloads models (1-4 GB)
GPU recommended for reasonable speed
Voice cloning needs clean 6+ second sample
Bark can hallucinate on long texts

zedit42-voice-cloning