text-to-speech

Installation
Summary

Natural speech synthesis from text across 70+ languages with multiple quality and latency models.

  • Six models available ranging from highest-quality eleven_v3 to ultra-low-latency eleven_flash_v2_5 (~75ms), with language and speed tradeoffs documented
  • Supports 13+ output formats including MP3, PCM, WAV, Opus, and telephony codecs (μ-law, A-law) for web, streaming, and real-time applications
  • Fine-tune voice characteristics via stability, similarity boost, style, speaker boost, and speed controls; enforce language pronunciation with ISO 639-1 codes
  • Request stitching eliminates audio artifacts when generating long content across multiple API calls; streaming mode for real-time playback
  • Requires ElevenLabs API key; character usage tracked via response headers for cost monitoring
SKILL.md

ElevenLabs Text-to-Speech

Generate natural speech from text - supports 70+ languages, multiple models for quality vs latency tradeoffs.

Setup: See Installation Guide. For JavaScript, use @elevenlabs/* packages only.

Quick Start

Python

from elevenlabs import ElevenLabs

client = ElevenLabs()

audio = client.text_to_speech.convert(
    text="Hello, welcome to ElevenLabs!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George
    model_id="eleven_multilingual_v2"
Related skills

More from elevenlabs/skills

Installs
4.8K
GitHub Stars
236
First Seen
Jan 27, 2026