elevenlabs

SKILL.md

ElevenLabs — Text-to-Speech, Voice Cloning & Sound Effects

ElevenLabs-powered audio generation. Convert any text to natural-sounding speech, clone voices, generate sound effects.


Setup

Required env var: ELEVENLABS_API_KEY

The CLI script at scripts/tts.py uses only Python stdlib (urllib, json, argparse) — no pip dependencies needed.


Quick Reference

All commands use the CLI script:

python skills/KORTIX-tts/scripts/tts.py <command> [args]

Speak — Convert text to speech

# Basic — default voice (George), multilingual v2 model
python scripts/tts.py speak "Hello, this is Kortix speaking."

# Named voice
python scripts/tts.py speak "Welcome to the presentation." --voice Rachel

# Custom output file
python scripts/tts.py speak "Chapter one." --voice George -o chapter1.mp3

# From a file (prefix with @)
python scripts/tts.py speak @article.txt -o narration.mp3

# From stdin
echo "Dynamic text" | python scripts/tts.py speak -

# With voice tuning
python scripts/tts.py speak "Dramatic reading." --voice Rachel --stability 0.3 --similarity 0.9 --style 0.7

# High quality output
python scripts/tts.py speak "Studio quality." --format mp3_44100_192

# Different model (faster, English-only)
python scripts/tts.py speak "Quick response." --model eleven_turbo_v2_5

# Speed control
python scripts/tts.py speak "Slowly now." --speed 0.7
python scripts/tts.py speak "Fast paced!" --speed 1.5

Voices — List and search

# List all available voices
python scripts/tts.py voices

# Search by name, gender, accent, or use case
python scripts/tts.py voices --search "female"
python scripts/tts.py voices --search "british"
python scripts/tts.py voices --search "narration"

Models — List available TTS models

python scripts/tts.py models

Clone — Create a custom voice from audio samples

# Clone from audio files (1-25 samples, each 1-10 minutes)
python scripts/tts.py clone "ClientVoice" sample1.mp3 sample2.mp3

# With description
python scripts/tts.py clone "CEO" ceo_speech.mp3 --description "Confident male voice, American accent"

# Use the cloned voice
python scripts/tts.py speak "Hello from my cloned voice." --voice-id <returned_voice_id>

Batch — Convert entire documents

# Convert a text file to a single audio file
python scripts/tts.py batch article.txt -o article_audio/

# Split by paragraphs — one audio file per paragraph
python scripts/tts.py batch book_chapter.txt --split-paragraphs -o chapter_audio/

# With specific voice
python scripts/tts.py batch script.txt --voice Rachel --split-paragraphs

Sound Effects — Generate from text prompts

# Generate a sound effect
python scripts/tts.py sound "ocean waves crashing on a beach"

# With specific output and duration
python scripts/tts.py sound "thunderstorm with heavy rain" -o thunder.mp3 --duration 10.0

Voice Settings Guide

Fine-tune voice output with these parameters:

Parameter Range Default Effect
--stability 0.0 - 1.0 0.5 Higher = more consistent, lower = more expressive/varied
--similarity 0.0 - 1.0 0.75 Higher = closer to original voice, lower = more creative
--style 0.0 - 1.0 0.0 Higher = more expressive style, can reduce stability
--speed 0.5 - 2.0 1.0 Playback speed multiplier

Recommended presets:

  • Narration/Audiobook: --stability 0.5 --similarity 0.75 (balanced, natural)
  • News/Formal: --stability 0.8 --similarity 0.8 (consistent, clear)
  • Character/Dramatic: --stability 0.3 --similarity 0.8 --style 0.7 (expressive, varied)
  • Conversational: --stability 0.4 --similarity 0.6 (natural variation)

Output Formats

Format Quality Size Use Case
mp3_44100_128 High (default) Medium General purpose, good quality
mp3_44100_192 Very high Large Studio quality, archival
mp3_22050_32 Low Small Voice messages, previews
pcm_44100 Lossless Very large Post-processing, editing
pcm_16000 Lossless low Large Speech recognition input
opus_48000_128 High Small Web streaming, efficient

Models

Model Speed Quality Languages Best For
eleven_multilingual_v2 Normal Highest 29 languages Default — best quality, multilingual
eleven_turbo_v2_5 Fast High 32 languages Low-latency, near-instant generation
eleven_monolingual_v1 Normal Good English only Legacy English-only workloads

Always use eleven_multilingual_v2 unless speed is critical (then use eleven_turbo_v2_5).


Common Workflows

Narrate a document

# Read the document, generate speech
python scripts/tts.py speak @workspace/report.md --voice Rachel -o report_narration.mp3

Create a podcast intro

python scripts/tts.py speak "Welcome to the Kortix Weekly. I'm your host, and today we're diving into autonomous AI agents." \
  --voice George --stability 0.4 --similarity 0.8 --style 0.5 \
  -o podcast_intro.mp3

Narrate a presentation (per-slide)

For each slide, generate a separate audio file:

python scripts/tts.py speak "Slide 1: Introduction to our company" --voice Rachel -o slides/01.mp3
python scripts/tts.py speak "Slide 2: Our key metrics this quarter" --voice Rachel -o slides/02.mp3

Or write all narration to a text file (one paragraph per slide) and batch it:

python scripts/tts.py batch slide_notes.txt --split-paragraphs --voice Rachel -o slide_audio/

Voice clone for personalization

# Clone the user's voice from samples they provide
python scripts/tts.py clone "UserVoice" sample1.mp3 sample2.mp3 sample3.mp3 \
  --description "The user's natural speaking voice"

# Use it for all future TTS
python scripts/tts.py speak "Personalized message." --voice-id <voice_id> -o message.mp3

Generate ambient audio

python scripts/tts.py sound "coffee shop ambiance with gentle chatter" -o ambient.mp3 --duration 15
python scripts/tts.py sound "gentle rain on a window" -o rain.mp3 --duration 30

Integration Notes

  • No pip dependencies. The script uses only Python stdlib (urllib.request, json, argparse). Works on any Python 3.10+ installation.
  • Output files are saved relative to the current working directory. Use -o to specify exact paths.
  • Long text is handled automatically by the API. For very long documents (>5000 chars), consider using batch with --split-paragraphs for better quality and to avoid timeouts.
  • Rate limits apply per your ElevenLabs plan. The script will return API errors if limits are hit.
  • Character usage counts against your ElevenLabs monthly quota. Check your plan's limits.

Env Vars

Variable Required Description
ELEVENLABS_API_KEY Yes Your ElevenLabs API key (also accepts ELEVEN_API_KEY)

Add to sandbox/.env and sandbox/opencode/.env:

ELEVENLABS_API_KEY=your_key_here
Weekly Installs
5
GitHub Stars
2
First Seen
13 days ago
Installed on
gemini-cli5
github-copilot5
codex5
kimi-cli5
cursor5
opencode5