text-to-speech

Installation
Summary

Multiple text-to-speech models via inference.sh CLI for voiceovers, podcasts, and accessibility.

  • Six models available: ElevenLabs (premium, 22+ voices, 32 languages), DIA TTS (conversational), Kokoro TTS (fast), Chatterbox, Higgs Audio (emotional control), and VibeVoice (long-form podcasts)
  • Core capabilities include basic speech synthesis, expressive speech with emotion control, and conversational dialogue generation
  • Easily combine with video tools like OmniHuman to create talking head avatars from generated audio
  • Requires inference.sh CLI (infsh) installed and authenticated; run apps via simple JSON input configuration
SKILL.md

Install the belt CLI skill: npx skills add belt-sh/cli

Text-to-Speech

Convert text to natural speech via inference.sh CLI.

Text-to-Speech

Quick Start

Requires inference.sh CLI (belt). Install instructions

belt login

# Generate speech
belt app run infsh/kokoro-tts --input '{"text": "Hello, welcome to our product demo."}'
Installs
GitHub Stars
459
First Seen