text-to-speech

SKILL.md

Text-to-Speech: Generate audio from text with Gemini

Overview

Convert text to natural-sounding speech using Google Gemini's TTS models. Supports:

  • 30 prebuilt voices with distinct characteristics
  • 24 languages with automatic detection
  • Single-speaker and multi-speaker audio
  • Natural intonation and expression

Reference: https://ai.google.dev/gemini-api/docs/speech-generation

How to use

bash ${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh --model=gemini-2.5-flash-preview-tts "TEXT TO SPEAK"

Arguments:

  • --model - Required: Use a TTS model (see Models below)
  • --voice - Optional: Voice name (default: Kore)

Examples:

# Generate speech with default voice
npx -y superconductor-gemini-skills --model=gemini-2.5-flash-preview-tts "Hello, welcome to our application."

# Use a specific voice
npx -y superconductor-gemini-skills --model=gemini-2.5-flash-preview-tts --voice=Puck "The quick brown fox jumps over the lazy dog."

# Generate longer narration
npx -y superconductor-gemini-skills --model=gemini-2.5-flash-preview-tts --voice=Charon "In today's tutorial, we'll explore the fundamentals of machine learning."

# Use higher quality model for professional content
npx -y superconductor-gemini-skills --model=gemini-2.5-pro-preview-tts --voice=Kore "This is a premium quality voice synthesis."

Available voices

Voice Name Description
Kore Default voice, clear and professional
Puck Friendly and warm
Charon Deep and authoritative
Fenrir Energetic and dynamic
Leda Soft and gentle
Orus Neutral and balanced
Zephyr Light and airy
Aoede Melodic and expressive

Additional voices: Altair, Calliope, Clio, Electra, Ember, Eris, Helios, Hyperion, Iris, Lyra, Melpomene, Nova, Orion, Polaris, Sage, Selene, Thalia, Titan, Vega, and more.

Supported languages

English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Japanese, Korean, Chinese (Simplified/Traditional), Arabic, Hindi, Turkish, Polish, Vietnamese, Thai, Indonesian, and more.

Languages are automatically detected from the input text.

Output

Generated audio is saved to the current directory as gemini-speech-{timestamp}.wav.

  • Format: WAV (PCM)
  • Sample rate: 24000 Hz
  • Channels: Mono
  • Bit depth: 16-bit

API Key

The GEMINI_API_KEY environment variable must be set. Get your key at: https://ai.google.dev/gemini-api/docs/api-key

Models

Model ID Context Window Pricing (Input / Output)
gemini-2.5-flash-preview-tts 8k / 16k $0.50 / $10 per 1M tokens
gemini-2.5-pro-preview-tts 8k / 16k $1.00 / $20 per 1M tokens
Weekly Installs
3
GitHub Stars
1
First Seen
Feb 9, 2026
Installed on
amp3
gemini-cli3
claude-code3
github-copilot3
codex3
kimi-cli3