skills/smithery/ai/voice-clone

voice-clone

Installation
SKILL.md

Voice Clone Skill

Use this skill to clone a speaker's voice and generate text-to-speech audio.

Two-Step Process

Step 1: Clone Voice (one-time)

python skills/voice-clone/clone.py <audio_sample.wav> [--transcript "text"]

Creates a speaker embedding file that can be reused.

Step 2: Generate Speech

python skills/voice-clone/speak.py <embedding.safetensors> "Text to speak"

Generates audio using the cloned voice.

Requirements

  • FAL_KEY in .env (fal.ai API key)
  • Voice sample: 10-30 seconds of clear speech (WAV/MP3)
  • Optional: Transcript of the sample for better quality

Output

  • assets/outputs/voice_embeddings/<name>_embedding.safetensors - Reusable voice model
  • assets/outputs/audio/<name>_speech.wav - Generated audio

Notes

  • qwen3-tts works best with Chinese speech samples
  • Cross-lingual cloning (Chinese voice → English speech) may have quality variations
  • Provide reference transcript for best quality
Weekly Installs
2
First Seen
Mar 8, 2026
Installed on
claude-code2
cursor1