text-to-speech
Installation
Summary
Multiple text-to-speech models via inference.sh CLI for voiceovers, podcasts, and accessibility.
- Six models available: ElevenLabs (premium, 22+ voices, 32 languages), DIA TTS (conversational), Kokoro TTS (fast), Chatterbox, Higgs Audio (emotional control), and VibeVoice (long-form podcasts)
- Core capabilities include basic speech synthesis, expressive speech with emotion control, and conversational dialogue generation
- Easily combine with video tools like OmniHuman to create talking head avatars from generated audio
- Requires inference.sh CLI (
infsh) installed and authenticated; run apps via simple JSON input configuration
SKILL.md
Install the belt CLI skill:
npx skills add belt-sh/cli
Text-to-Speech
Convert text to natural speech via inference.sh CLI.

Quick Start
Requires inference.sh CLI (
belt). Install instructions
belt login
# Generate speech
belt app run infsh/kokoro-tts --input '{"text": "Hello, welcome to our product demo."}'