venice-audio-music
Venice Music / Async Audio
Music (and long-form voice) generation is asynchronous. The flow is:
POST /api/v1/audio/quote → price in USD
POST /api/v1/audio/queue → { queue_id } (funds reserved)
POST /api/v1/audio/retrieve → status or binary audio
POST /api/v1/audio/complete → finalize & delete media
For short text-to-speech, use the synchronous venice-audio-speech endpoint instead.
Use when
- You need songs, jingles, score, soundscape, or long narration.
- The selected model uses duration-based or character-based pricing and must be priced before submission.
- The expected generation time is long enough (> 20 s) that sync call would time out.
Lifecycle
1. POST /audio/quote — price it first
curl https://api.venice.ai/api/v1/audio/quote \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "elevenlabs-music",
"duration_seconds": 60
}'
Response: {"quote": 0.48} (USD).
| Field | Notes |
|---|---|
model |
Required. Music/audio model from GET /models?type=music. |
duration_seconds |
Integer or numeric string. Only if the model reports duration metadata. |
character_count |
Required for models with pricing.per_thousand_characters (long narration). |
2. POST /audio/queue — enqueue
curl https://api.venice.ai/api/v1/audio/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "elevenlabs-music",
"prompt": "Uplifting indie-folk acoustic track, 120 BPM, major key.",
"lyrics_prompt": "Verse 1: Walking through the city lights...\nChorus: We are the dreamers...",
"duration_seconds": 60,
"voice": "Aria",
"language_code": "en",
"speed": 1.0,
"force_instrumental": false,
"lyrics_optimizer": false
}'
Response: { "model": "...", "queue_id": "uuid" }.
| Field | Notes |
|---|---|
model |
Required. |
prompt |
Required. Describe genre, mood, tempo, instruments. Length caps in /models. |
lyrics_prompt |
Lyrics. Required when lyrics_required=true, rejected when supports_lyrics=false. |
duration_seconds |
Integer or string. Model-dependent. |
force_instrumental |
Only when supports_force_instrumental=true. |
lyrics_optimizer |
Auto-generate lyrics from prompt. Requires supports_lyrics_optimizer=true. lyrics_prompt must be empty. |
voice |
For voice-enabled models. See voices + default_voice in /models. |
language_code |
ISO 639-1. Requires supports_language_code=true. |
speed |
Requires supports_speed=true. Use model's min_speed/max_speed. |
3. POST /audio/retrieve — poll status / download
curl https://api.venice.ai/api/v1/audio/retrieve \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"elevenlabs-music","queue_id":"..."}' \
--output track.mp3
- If still processing: JSON
{"status":"PROCESSING","average_execution_time":...,"execution_duration":...}. - If done: binary audio body (
audio/mpegor similar). Save the bytes. - Set
delete_media_on_completion: trueto skip step 4.
Poll every 2–5 s; use average_execution_time (ms, P80) as a guideline for your first poll delay.
4. POST /audio/complete — cleanup
curl https://api.venice.ai/api/v1/audio/complete \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"elevenlabs-music","queue_id":"..."}'
Removes the media from Venice storage after you've downloaded it. Required unless you used delete_media_on_completion: true on retrieve.
Full loop (TypeScript)
const base = 'https://api.venice.ai/api/v1'
const headers = {
Authorization: `Bearer ${process.env.VENICE_API_KEY}`,
'Content-Type': 'application/json',
}
async function generateTrack() {
// 1. Quote
const quote = await fetch(`${base}/audio/quote`, {
method: 'POST', headers,
body: JSON.stringify({ model: 'elevenlabs-music', duration_seconds: 60 }),
}).then(r => r.json())
console.log('price:', quote.quote)
// 2. Queue
const { queue_id, model } = await fetch(`${base}/audio/queue`, {
method: 'POST', headers,
body: JSON.stringify({
model: 'elevenlabs-music',
prompt: 'Uplifting indie-folk acoustic track, 120 BPM.',
duration_seconds: 60,
force_instrumental: true,
}),
}).then(r => r.json())
// 3. Poll
while (true) {
const res = await fetch(`${base}/audio/retrieve`, {
method: 'POST', headers,
body: JSON.stringify({ model, queue_id }),
})
const ct = res.headers.get('content-type') ?? ''
if (ct.startsWith('audio/')) {
const buf = Buffer.from(await res.arrayBuffer())
await fs.writeFile('track.mp3', buf)
break
}
const { status } = await res.json()
if (status !== 'PROCESSING') throw new Error(`unexpected ${status}`)
await new Promise(r => setTimeout(r, 3000))
}
// 4. Complete
await fetch(`${base}/audio/complete`, {
method: 'POST', headers,
body: JSON.stringify({ model, queue_id }),
})
}
Capability probing
Before calling /audio/queue, inspect the model entry returned by GET /models?type=music — each row's model_spec exposes (among other fields):
supports_lyrics,lyrics_required,supports_lyrics_optimizersupports_force_instrumental,supports_speed,supports_language_codevoices[],default_voicemin_prompt_length,prompt_character_limitmin_speed,max_speedpricing.generation(per-job),pricing.per_second(per second generated),pricing.per_thousand_characters(character-priced narration), orpricing.durations(duration-tiered map:{ "<tier>": { usd, diem, min_seconds, max_seconds } }) — each model uses one of these shapes
Errors
| Code | Meaning |
|---|---|
400 |
Wrong params (lyrics on an instrumental-only model, duration_seconds outside allowed range, voice not in model's list). |
401 |
Auth / Pro-only model. |
402 |
Insufficient balance. Bearer → INSUFFICIENT_BALANCE; x402 → PAYMENT_REQUIRED. |
404 |
On retrieve/complete: unknown / expired queue_id. |
422 |
Content policy violation. ContentViolationError may include suggested_prompt. |
429 |
Rate limited. |
500 / 503 |
Inference or capacity issue. |
Gotchas
- Quote before queue — music is pay-per-second; unexpected
duration_secondscan blow through a budget. Use/audio/quoteto gate thequeuecall against your available balance (/billing/balanceor/x402/balance/...). queue_idis UUIDv4. Store it alongside themodel— both are required for every subsequent call.- Media URLs are ephemeral. Download during
retrieveand store yourself; aftercomplete, Venice deletes the file. lyrics_optimizer: trueand a non-emptylyrics_promptis a400.- Poll rate: don't hammer
/retrieve. 2–5 s is plenty — the job queue is the same regardless of poll frequency. execution_durationfrom the retrieve status is cumulative (ms since enqueue);average_execution_timeis the P80 expected total.
More from veniceai/skills
venice-audio-transcription
Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.
29venice-video
Generate and transcribe videos via Venice. Covers the async /video/quote + /video/queue + /video/retrieve + /video/complete loop, text-to-video, image-to-video, video-to-video (upscale), audio input, reference images, scene and element support, plus /video/transcriptions for YouTube URLs.
28venice-audio-speech
Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per family, output formats (mp3/opus/aac/flac/wav/pcm), streaming, prompt/emotion styling, temperature/top_p, and language hints.
28venice-image-generate
Generate images with Venice. Covers POST /image/generate (Venice-native), POST /images/generations (OpenAI-compatible), GET /image/styles (style presets), request fields (prompt, dimensions, cfg_scale, seed, variants, style_preset, aspect_ratio, resolution, safe_mode, watermark), and response formats.
28venice-embeddings
Call POST /embeddings on Venice. Covers request shape (input, model, encoding_format, dimensions, user), OpenAI compatibility, response compression (gzip/br), and practical usage for retrieval, clustering, and RAG.
28venice-errors
Handle Venice API errors correctly. Covers the StandardError / DetailedError / ContentViolationError / X402InferencePaymentRequired body shapes, every meaningful status code (400, 401, 402, 403, 415, 422, 429, 500, 503, 504), the 402 PAYMENT-REQUIRED header used by x402 inference, 422 content-policy suggested_prompt retry pattern, 429 rate-limit headers, and an exponential-backoff retry strategy with idempotency.
27