inworld
SKILL.md
Inworld AI
Text-to-Speech platform with voice cloning, audio markups, and timestamp alignment.
Quick Navigation
| Topic | Reference |
|---|---|
| Installation | installation.md |
| Voice Cloning | cloning.md |
| Voice Control | voice-control.md |
| API Reference | api.md |
When to Use
- Text-to-speech audio generation
- Voice cloning from 5-15 seconds of audio
- Emotion-controlled speech (
[happy],[sad], etc.) - Word/phoneme timestamps for lip sync
- Custom pronunciation with IPA
Models
| Model | ID | Latency | Price |
|---|---|---|---|
| TTS 1.5 Max | inworld-tts-1.5-max |
~200ms | $10/1M chars |
| TTS 1.5 Mini | inworld-tts-1.5-mini |
~120ms | $5/1M chars |
Minimal Example
import requests, base64, os
response = requests.post(
"https://api.inworld.ai/tts/v1/voice",
headers={"Authorization": f"Basic {os.getenv('INWORLD_API_KEY')}"},
json={"text": "Hello!", "voiceId": "Ashley", "modelId": "inworld-tts-1.5-max"}
)
audio = base64.b64decode(response.json()['audioContent'])
Key Features
- 15 languages — en, zh, ja, ko, ru, it, es, pt, fr, de, pl, nl, hi, he, ar
- Instant cloning — 5-15 seconds audio, no training
- Audio markups —
[happy],[laughing],[sigh](English only) - Timestamps — word, phoneme, viseme timing for lip sync
- Streaming —
/voice:streamendpoint
Prohibitions
- Audio markups work only in English
- Use ONE emotion markup at text beginning
- Match voice language to text language
- Instant cloning may not work for children's voices or unique accents
Links
Weekly Installs
28
Repository
itechmeat/llm-codeGitHub Stars
9
First Seen
Jan 29, 2026
Security Audits
Installed on
codex26
opencode24
github-copilot23
cursor23
gemini-cli22
cline21