skills/smithery.ai/zedit42-voice-cloning

zedit42-voice-cloning

SKILL.md

Voice Cloning

Generate speech and clone voices locally without API costs.

When to Use

  • Need text-to-speech without paying for ElevenLabs/OpenAI
  • Want to clone a voice from a sample
  • Creating podcasts, voiceovers, or audio content
  • Privacy-sensitive applications (no data leaves your machine)

Quick Start

Option 1: Coqui TTS (Best Quality)

# Install
pip install TTS

# List available models
tts --list_models

# Generate speech
tts --text "Hello, this is a test." --out_path output.wav

# Use specific model (recommended: XTTS v2)
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
    --text "Hello world" \
    --out_path output.wav

Option 2: Bark (Most Natural)

# Install
pip install git+https://github.com/suno-ai/bark.git

# Use via Python
python skills/voice-cloning/scripts/bark-generate.py "Your text here" output.wav

Option 3: Piper (Fastest)

# Install
pip install piper-tts

# Generate (very fast, good for bulk)
echo "Hello world" | piper --model en_US-lessac-medium --output_file output.wav

Voice Cloning (XTTS v2)

Clone any voice from a 6+ second audio sample:

python skills/voice-cloning/scripts/clone-voice.py \
    --sample voice_sample.wav \
    --text "Text to speak in cloned voice" \
    --output cloned_output.wav

Available Scripts

scripts/coqui-generate.py

Basic TTS generation with Coqui.

scripts/bark-generate.py

Natural-sounding speech with Bark (slower but more expressive).

scripts/clone-voice.py

Clone a voice from an audio sample using XTTS v2.

scripts/batch-tts.py

Generate multiple audio files from a text file (one line = one file).

Model Comparison

Model Quality Speed Voice Clone Languages
XTTS v2 ★★★★★ Slow ✅ Yes 16
Bark ★★★★★ Very Slow ❌ No EN mainly
Piper ★★★☆☆ Very Fast ❌ No 30+

Tips

  1. For quality: Use XTTS v2 or Bark
  2. For speed: Use Piper
  3. For cloning: XTTS v2 is your only free option
  4. GPU recommended: Bark and XTTS are slow on CPU

Limitations

  • First run downloads models (1-4 GB)
  • GPU recommended for reasonable speed
  • Voice cloning needs clean 6+ second sample
  • Bark can hallucinate on long texts
Weekly Installs
1
First Seen
13 days ago
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1