venice-video
Venice Video
Video is asynchronous — like audio music. Five endpoints:
| Endpoint | Purpose |
|---|---|
POST /video/quote |
Price in USD (no charge, no job). |
POST /video/queue |
Enqueue generation. Returns queue_id, charges (reserves) funds. |
POST /video/retrieve |
Poll status or download video/mp4. |
POST /video/complete |
Finalize & delete media from Venice storage. |
POST /video/transcriptions |
Sync: transcribe a YouTube URL's audio. |
Use when
- You need text-to-video, image-to-video, video upscale, video-with-audio, or video transcription.
- You can tolerate async execution (single-digit seconds to several minutes depending on model, duration, and queue depth — inspect
average_execution_timeandexecution_durationon/video/retrievefor your job's live estimate). - You want to price a job precisely before committing (
/video/quote).
Lifecycle — generation
1. Price with /video/quote
curl https://api.venice.ai/api/v1/video/quote \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "wan-2-7-text-to-video",
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "720p",
"audio": true
}'
Response: {"quote": 0.35} USD.
2. Submit with /video/queue
curl https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "wan-2-7-text-to-video",
"prompt": "Commerce being conducted in the city of Venice, Italy.",
"negative_prompt": "low resolution, worst quality, defects",
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "720p",
"audio": true
}'
Response: { "model": "...", "queue_id": "uuid", "download_url": "https://..." }.
download_urlonly appears for VPS-backed models. When present, the retrieve endpoint returns JSON status only — fetch this URL to download. Valid 24 h.
3. Poll with /video/retrieve
curl https://api.venice.ai/api/v1/video/retrieve \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"...","queue_id":"..."}' \
--output out.mp4
- Processing: JSON
{"status":"PROCESSING","average_execution_time":145000,"execution_duration":53200}(ms). - Completed (non-VPS): binary
video/mp4body. - Completed (VPS-backed):
{"status":"COMPLETED", ...}— fetch thedownload_urlfrom the queue response. delete_media_on_completion: trueauto-deletes after successful retrieve.
4. Finalize with /video/complete
curl https://api.venice.ai/api/v1/video/complete \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"...","queue_id":"..."}'
QueueVideoRequest fields
Availability depends on the model — check GET /models?type=video.
| Field | Type | Notes |
|---|---|---|
model |
string | Required. |
prompt |
string, ≤ 2500–3500 | Required (min length 1). Max length varies per model. |
negative_prompt |
string, ≤ 2500–3500 | — |
duration |
enum 2s..30s or Auto |
Required. Model-specific subset. |
aspect_ratio |
1:1, 2:3, 3:2, 3:4, 4:3, 9:16, 16:9, 21:9 |
Some models ignore. |
resolution |
256p..4k, or upscale hints 2x / 4x / true_1080p |
Use upscale_factor for upscale models. |
upscale_factor |
1 / 2 / 4 |
Only for upscale models. 1 = quality enhancement. |
audio |
bool | Default true. Audio-capable models. |
image_url |
URL or data: URL |
Image-to-video reference frame. |
end_image_url |
URL or data URL | End frame / transition reference. |
audio_url |
URL or data URL | Background music input. WAV/MP3, ≤ 30 s, ≤ 15 MB. |
video_url |
URL or data URL | Video-to-video / upscale input. MP4/MOV/WebM. |
reference_image_urls[] |
array of URLs, ≤ 9 | Character / style consistency images. |
elements[] |
array, ≤ 4 | Advanced models (e.g. Kling O3 R2V): each has frontal_image_url, up to 3 reference_image_urls, video_url. Reference in prompt as @Element1, @Element2. |
scene_image_urls[] |
array of URLs, ≤ 4 | Advanced scene refs; reference in prompt as @Image1, @Image2. |
Common recipes
Text → video with audio
{
"model": "wan-2-7-text-to-video",
"prompt": "A golden retriever chasing a frisbee in slow motion at sunset.",
"duration": "6s",
"aspect_ratio": "16:9",
"resolution": "720p",
"audio": true
}
Image → video
{
"model": "<image-to-video model>",
"prompt": "Camera slowly zooms out, revealing the cityscape.",
"image_url": "https://example.com/cityscape.jpg",
"duration": "5s",
"aspect_ratio": "16:9"
}
Video upscale
{
"model": "<upscale model>",
"video_url": "data:video/mp4;base64,...",
"upscale_factor": 2,
"duration": "Auto"
}
Multi-element consistency (Kling O3 R2V-style)
{
"model": "<advanced-model>",
"prompt": "@Element1 walks toward @Element2 against @Image1.",
"elements": [
{ "frontal_image_url": "<char1.png>", "reference_image_urls": ["<alt1.png>"] },
{ "frontal_image_url": "<char2.png>" }
],
"scene_image_urls": ["<street-scene.jpg>"]
}
/video/transcriptions (sync)
Transcribe a YouTube video URL directly — no queue.
curl https://api.venice.ai/api/v1/video/transcriptions \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://www.youtube.com/watch?v=...","response_format":"json"}'
Response: {"transcript":"...","lang":"en"} (JSON) or plain text/plain body when response_format: text.
For arbitrary audio files, use venice-audio-transcription instead.
Full polling loop
async function waitForVideo(model: string, queueId: string, downloadUrl?: string) {
while (true) {
const res = await fetch(`${base}/video/retrieve`, {
method: 'POST', headers,
body: JSON.stringify({ model, queue_id: queueId }),
})
const ct = res.headers.get('content-type') ?? ''
if (ct.startsWith('video/')) {
return Buffer.from(await res.arrayBuffer())
}
const body = await res.json()
if (body.status === 'COMPLETED' && downloadUrl) {
const v = await fetch(downloadUrl)
return Buffer.from(await v.arrayBuffer())
}
if (body.status !== 'PROCESSING') throw new Error(`unexpected ${body.status}`)
await new Promise(r => setTimeout(r, 5000))
}
}
Errors
| Code | Meaning |
|---|---|
400 |
Bad params (duration/resolution not supported by model, missing required image_url for i2v, missing prompt, etc.). |
401 |
Auth / Pro-only. |
402 |
Insufficient balance. |
403 |
Model unavailable in your region. |
413 |
Request payload too large — shrink images / audio. (Returned from /video/queue.) |
422 |
Content policy violation. (Returned from /video/queue.) |
500 |
Inference failed. |
503 |
Model at capacity — retry later. On /video/retrieve, returned when the queue is backed up. |
/video/queue does not document 503 in the spec — upstream capacity issues surface there as 500. Watch for 503 specifically on /video/retrieve.
Gotchas
durationis required on/video/queue. EvenAutois a valid explicit value.download_urlis only sometimes returned at queue time. Always handle both paths: binary from/retrieveOR fetchingdownload_urlafter statusCOMPLETED.download_urlexpires in 24 h — download promptly.- Upscale models use
upscale_factorinstead ofresolution. reference_image_urls[]is capped at 9 entries,elements[]at 4,scene_image_urls[]at 4. Over-limit is400.data:URLs count toward payload size; large base64 videos may trip413— prefer hosted URLs./video/transcriptionsis YouTube-URL-only; it does not accept arbitrary video uploads (use ffmpeg to strip audio, then/audio/transcriptions).
More from veniceai/skills
venice-audio-transcription
Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.
29venice-audio-speech
Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per family, output formats (mp3/opus/aac/flac/wav/pcm), streaming, prompt/emotion styling, temperature/top_p, and language hints.
28venice-image-generate
Generate images with Venice. Covers POST /image/generate (Venice-native), POST /images/generations (OpenAI-compatible), GET /image/styles (style presets), request fields (prompt, dimensions, cfg_scale, seed, variants, style_preset, aspect_ratio, resolution, safe_mode, watermark), and response formats.
28venice-embeddings
Call POST /embeddings on Venice. Covers request shape (input, model, encoding_format, dimensions, user), OpenAI compatibility, response compression (gzip/br), and practical usage for retrieval, clustering, and RAG.
28venice-errors
Handle Venice API errors correctly. Covers the StandardError / DetailedError / ContentViolationError / X402InferencePaymentRequired body shapes, every meaningful status code (400, 401, 402, 403, 415, 422, 429, 500, 503, 504), the 402 PAYMENT-REQUIRED header used by x402 inference, 422 content-policy suggested_prompt retry pattern, 429 rate-limit headers, and an exponential-backoff retry strategy with idempotency.
27venice-audio-music
Async music / audio-track generation via Venice. Covers the /audio/quote + /audio/queue + /audio/retrieve + /audio/complete lifecycle, lyrics vs instrumental, voice selection, duration, language, speed, model capability probing, and webhook-free polling.
27