venice-audio-speech
Venice TTS (/audio/speech)
POST /api/v1/audio/speech converts text to an audio stream or file. OpenAI-compatible — the OpenAI SDK's audio.speech.create() works as a drop-in.
Use when
- You want narration, voice replies, or UI audio from text.
- You need a specific voice family (ElevenLabs, Kokoro, xAI, Qwen 3, Orpheus, Chatterbox, MiniMax, Inworld, Gemini Flash).
- You want streaming audio returned sentence-by-sentence.
- You need style/emotion control on supported models.
For music generation (lyrics + instrumental), see venice-audio-music. For transcription (audio → text), see venice-audio-transcription.
Minimal request
curl https://api.venice.ai/api/v1/audio/speech \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-xai-v1",
"voice": "eve",
"input": "Hello, welcome to Venice Voice.",
"response_format": "mp3",
"speed": 1.0,
"streaming": false
}' --output hello.mp3
Response is the raw audio (Content-Type matches response_format).
Request schema
| Field | Type | Default | Notes |
|---|---|---|---|
input |
string | — | Required. Up to 4096 characters. |
model |
enum | tts-kokoro (OpenAPI schema default) |
See model list below. tts-xai-v1 is the recommended frontier default; pick the model that fits your voice + language needs. |
voice |
enum | model-specific (e.g. eve for tts-xai-v1) |
Voice is model-specific — wrong combo = 400. See voice families. |
response_format |
mp3 / opus / aac / flac / wav / pcm |
mp3 |
pcm returns 24 kHz signed-16 LE for pipelines. |
speed |
number | 1.0 |
Range 0.25–4.0. |
streaming |
bool | false |
true → streamed sentence-by-sentence as audio continues to generate. |
language |
string | — | Optional hint. Accepted form depends on model (Qwen 3 = full names like English; xAI / ElevenLabs = ISO 639-1 like en; MiniMax = full names). Unsupported values silently ignored. |
prompt |
string, ≤ 500 | — | Emotion / style cue. Only for models with supportsPromptParam (Qwen 3 currently). Examples: "Very happy.", "Sad and slow.". |
temperature |
0–2 | — | Sampling temperature. Only for models with supportsTemperatureParam (Qwen 3, Orpheus, Chatterbox HD). |
top_p |
0–1 | — | Only Qwen 3 currently. |
Models
| Model ID | Family | Highlights |
|---|---|---|
tts-xai-v1 |
xAI | Recommended default. Conversational style, ISO 639-1 language hints. |
tts-kokoro |
Kokoro | OpenAPI schema default. Multilingual, many voices across languages. |
tts-qwen3-0-6b / tts-qwen3-1-7b |
Qwen 3 | Emotion control via prompt, temperature, top_p. |
tts-inworld-1-5-max |
Inworld | Character-driven voices (Craig, Ashley, …). |
tts-chatterbox-hd |
Chatterbox | HD voices (Aurora, Blade, …), temperature. |
tts-orpheus |
Orpheus | Conversational (tara, leah, jess, leo, …), temperature. |
tts-elevenlabs-turbo-v2-5 |
ElevenLabs Turbo | Rachel, Aria, Charlotte, Roger, … |
tts-minimax-speech-02-hd |
MiniMax | WiseWoman, DeepVoiceMan, … |
tts-gemini-3-1-flash |
Gemini Flash | Star-named voices (Achernar, Achird, Zephyr, …). |
Always inspect the entry for your model in GET /models?type=tts — model_spec.voices is the authoritative voice list. Per-model toggles like supportsPromptParam, supportsTemperatureParam, supportsTopPParam live on the internal model definitions but are not currently exposed on /models — treat the request schema below (instructions, temperature, top_p) as the support matrix.
Voice families (by prefix)
- Kokoro — lowercase + language/gender prefix:
af_*,am_*— American female / malebf_*,bm_*— British female / malezf_*,zm_*— Chineseff_*,hf_*,hm_*,if_*,im_*,jf_*,jm_*,pf_*,pm_*,ef_*,em_*— French, Hindi, Italian, Japanese, Portuguese, Spanish- Examples:
af_sky,af_bella,am_adam,bm_george,zf_xiaoxiao
- Qwen 3 —
Vivian,Serena,Ono_Anna,Sohee,Uncle_Fu,Dylan,Eric,Ryan,Aiden - xAI —
eve,ara,rex,sal,leo - Orpheus —
tara,leah,jess,mia,zoe,dan,zac - Inworld —
Craig,Ashley,Olivia,Sarah,Elizabeth,Priya,Alex,Edward,Theodore,Ronald,Mark,Hades,Luna,Pixie - Chatterbox —
Aurora,Britney,Siobhan,Vicky,Blade,Carl,Cliff,Richard,Rico - ElevenLabs Turbo —
Rachel,Aria,Laura,Charlotte,Alice,Matilda,Jessica,Lily,Roger,Charlie,George,Callum,River,Liam,Will,Chris,Brian,Daniel,Bill - MiniMax —
WiseWoman,FriendlyPerson,InspirationalGirl,CalmWoman,LivelyGirl,LovelyGirl,SweetGirl,ExuberantGirl,DeepVoiceMan,CasualGuy,PatientMan,YoungKnight,DeterminedMan,ImposingManner,ElegantMan - Gemini 3 Flash — star names:
Achernar,Achird,Algenib,Algieba,Alnilam,Aoede,Autonoe,Callirrhoe,Charon,Despina,Enceladus,Erinome,Fenrir,Gacrux,Iapetus,Kore,Laomedeia,Leda,Orus,Pulcherrima,Puck,Rasalgethi,Sadachbia,Sadaltager,Schedar,Sulafat,Umbriel,Vindemiatrix,Zephyr,Zubenelgenubi
Pass a voice that isn't in the chosen model's list and you get 400.
Streaming
{
"model": "tts-xai-v1",
"voice": "eve",
"input": "Hello, this is a long document to narrate. ...",
"streaming": true,
"response_format": "mp3"
}
With streaming: true, the HTTP body is a chunked audio stream. Decode as it arrives — useful for latency-sensitive UIs. response_format: pcm pairs well with browser Web Audio API for raw playback.
OpenAI SDK
import OpenAI from 'openai'
import fs from 'node:fs/promises'
const client = new OpenAI({
apiKey: process.env.VENICE_API_KEY,
baseURL: 'https://api.venice.ai/api/v1',
})
const mp3 = await client.audio.speech.create({
model: 'tts-xai-v1',
voice: 'eve',
input: 'Hello from Venice.',
response_format: 'mp3',
})
await fs.writeFile('hello.mp3', Buffer.from(await mp3.arrayBuffer()))
Emotion / style (Qwen 3 only)
{
"model": "tts-qwen3-1-7b",
"voice": "Vivian",
"input": "We did it!",
"prompt": "Excited and energetic.",
"temperature": 0.9,
"top_p": 0.95
}
For other families, emotion comes from the voice choice itself (e.g. Inworld Hades vs Pixie). prompt / temperature / top_p are silently ignored.
Errors
| Code | Meaning |
|---|---|
400 |
Bad voice/model combo, input too long (>4096), language hint rejected by a strict model, invalid voice for the chosen model. |
401 |
Auth / Pro-only model. |
402 |
Insufficient balance. |
429 |
Rate limited. |
500 / 503 |
Inference / capacity issue — retry with jitter. |
Gotchas
inputhard cap is 4096 chars. For books / long content, split on sentence boundaries and concatenate audio client-side.streaming: true+ SDKs: some OpenAI SDK versions don't expose streaming foraudio.speech.create; call the REST endpoint directly and consume the HTTP body.speedcompounds with model internal speech rate — extreme values (0.25,4.0) often sound unnatural; keep within0.8–1.3for narration.- Voice names are case-sensitive (
eve≠EVE,af_sky≠AF_SKY).
More from veniceai/skills
venice-audio-transcription
Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.
29venice-video
Generate and transcribe videos via Venice. Covers the async /video/quote + /video/queue + /video/retrieve + /video/complete loop, text-to-video, image-to-video, video-to-video (upscale), audio input, reference images, scene and element support, plus /video/transcriptions for YouTube URLs.
28venice-image-generate
Generate images with Venice. Covers POST /image/generate (Venice-native), POST /images/generations (OpenAI-compatible), GET /image/styles (style presets), request fields (prompt, dimensions, cfg_scale, seed, variants, style_preset, aspect_ratio, resolution, safe_mode, watermark), and response formats.
28venice-embeddings
Call POST /embeddings on Venice. Covers request shape (input, model, encoding_format, dimensions, user), OpenAI compatibility, response compression (gzip/br), and practical usage for retrieval, clustering, and RAG.
28venice-errors
Handle Venice API errors correctly. Covers the StandardError / DetailedError / ContentViolationError / X402InferencePaymentRequired body shapes, every meaningful status code (400, 401, 402, 403, 415, 422, 429, 500, 503, 504), the 402 PAYMENT-REQUIRED header used by x402 inference, 422 content-policy suggested_prompt retry pattern, 429 rate-limit headers, and an exponential-backoff retry strategy with idempotency.
27venice-audio-music
Async music / audio-track generation via Venice. Covers the /audio/quote + /audio/queue + /audio/retrieve + /audio/complete lifecycle, lyrics vs instrumental, voice selection, duration, language, speed, model capability probing, and webhook-free polling.
27