venice-chat
Venice Chat Completions
POST /api/v1/chat/completions is Venice's main text endpoint. It's OpenAI-compatible, plus a venice_parameters object for Venice-only features.
Use when
- You need LLM text generation, with or without tools, with or without streaming.
- You want multimodal inputs (images, audio, video) to a vision/audio-capable model.
- You want Venice-specific features: web search, E2EE, characters, xAI X/Twitter search, strip-thinking, web scraping.
- You need prompt caching for large system prompts or long documents.
- You need structured (
json_schema) output.
For the newer Alpha Responses API, see venice-responses.
Minimal request
curl https://api.venice.ai/api/v1/chat/completions \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zai-org-glm-5-1",
"messages": [{"role": "user", "content": "Why is the sky blue?"}]
}'
Response shape is the standard OpenAI chat.completion object (id, object: "chat.completion", choices[].message, usage). With stream: true, responses come as SSE data: lines in chat.completion.chunk format.
The request body
Core fields (OpenAI-compatible)
| Field | Notes |
|---|---|
model |
string — model ID, trait name, or compatibility mapping. Suffixes allowed (see below). Required. |
messages |
array of system / developer / user / assistant / tool messages. Required, min 1. |
temperature, top_p, top_k, min_p, min_temp, max_temp |
sampling controls |
repetition_penalty, frequency_penalty, presence_penalty |
repetition controls |
max_tokens (deprecated) / max_completion_tokens |
upper bound on output tokens |
n |
number of choices (keep 1 to minimize cost) |
seed |
integer for reproducibility |
stop / stop_token_ids |
up to 4 strings, or raw token IDs |
stream, stream_options.include_usage |
SSE streaming + include usage in the final chunk |
response_format |
{type:"json_schema", json_schema:{...}} (preferred), {type:"json_object"}, or {type:"text"} |
tools, tool_choice, parallel_tool_calls |
function calling / built-in tools |
logprobs, top_logprobs |
return token log-probabilities |
reasoning.effort / reasoning_effort |
none | minimal | low | medium | high | xhigh | max |
reasoning.summary |
auto | concise | detailed |
prompt_cache_key, prompt_cache_retention (default/extended/24h) |
prompt caching hints |
text.verbosity |
low/medium/high/auto |
metadata |
key/value strings for tracking |
user, store |
accepted but ignored (OpenAI compat) |
venice_parameters (Venice-only)
All optional. Combined with model feature suffixes, these are how you enable Venice features.
| Field | Type | Default | Effect |
|---|---|---|---|
character_slug |
string | — | Apply a published Venice character. Slug is the "Public ID" on the character page. See venice-characters. |
strip_thinking_response |
bool | false |
Strip <think>...</think> from the assistant output on reasoning models. |
disable_thinking |
bool | false |
Disable thinking entirely on supported reasoning models and strip tags. |
enable_e2ee |
bool | true |
End-to-end encryption on E2EE-capable models when E2EE headers are present. Set to false to force TEE-only. |
enable_web_search |
"off"/"auto"/"on" |
"off" |
Venice server-side web search. Citations arrive in the first streamed chunk or the response. |
enable_web_scraping |
bool | false |
Scrape any URLs found in the last user message (Firecrawl). |
enable_web_citations |
bool | false |
Ask the LLM to cite sources with ^1^ / ^1,3^ superscripts. |
include_search_results_in_stream |
bool | false |
Experimental — emit search results as the first stream chunk. |
return_search_results_as_documents |
bool | — | Also surface search results as a synthetic tool call venice_web_search_documents (LangChain-friendly). |
include_venice_system_prompt |
bool | true |
Prepend Venice's curated system prompt. Turn off for full control. |
enable_x_search |
bool | false |
xAI native web + X/Twitter search (Grok models with supportsXSearch). Adds ~$0.01/search. |
Model feature suffixes
Some venice_parameters can also be expressed as model feature suffixes on the model string — useful when the caller/library (OpenAI SDK, LangChain) can't set venice_parameters. Syntax:
<model-id>:<key>=<value>[&<key>=<value>…]
Values are URL-decoded. Supported keys (exact match):
| Key | Type | Maps to |
|---|---|---|
enable_web_search |
on / off / auto |
venice_parameters.enable_web_search |
enable_web_citations |
"true" / "false" |
venice_parameters.enable_web_citations |
enable_web_scraping |
"true" / "false" |
venice_parameters.enable_web_scraping |
include_venice_system_prompt |
"true" / "false" |
venice_parameters.include_venice_system_prompt |
include_search_results_in_stream |
"true" / "false" |
venice_parameters.include_search_results_in_stream |
return_search_results_as_documents |
"true" / "false" |
venice_parameters.return_search_results_as_documents |
character_slug |
string | venice_parameters.character_slug |
strip_thinking_response |
"true" / "false" |
venice_parameters.strip_thinking_response |
disable_thinking |
"true" / "false" |
venice_parameters.disable_thinking |
Unknown keys are silently ignored. Examples:
zai-org-glm-5-1:enable_web_search=on
kimi-k2-6:strip_thinking_response=true&enable_web_search=auto
zai-org-glm-5-1:character_slug=alan-watts
Note: enable_e2ee and enable_x_search can only be set via venice_parameters, not as suffixes.
Messages and modalities
messages[].content is either a string or an array of typed parts. Roles: user, assistant, tool, system, developer (reasoning models like o-series / codex).
Text + image (image_url)
{
"model": "zai-org-glm-5-1",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
]
}]
}
urlaccepts a public URL ordata:image/png;base64,....- Models with
model_spec.capabilities.supportsMultipleImages: truepreserve images across the whole conversation; single-image vision models only keep images from the last user message. Checkmodel_spec.capabilities.maxImagesfor the per-request cap.
Audio input (input_audio)
{
"role": "user",
"content": [
{"type": "text", "text": "Transcribe this clip."},
{"type": "input_audio", "input_audio": {"data": "<base64>", "format": "wav"}}
]
}
Formats: wav, mp3, aiff, aac, ogg, flac, m4a, pcm16, pcm24. Audio URLs are not supported — always inline base64.
Video input (video_url)
{
"role": "user",
"content": [
{"type": "text", "text": "Summarize this."},
{"type": "video_url", "video_url": {"url": "https://www.youtube.com/watch?v=..."}}
]
}
Accepts public URLs (including YouTube for some providers) or data:video/mp4;base64,.... Supported formats: mp4, mpeg, mov, webm.
Prompt caching (cache_control)
Any text / image_url / input_audio / video_url part can carry:
{"cache_control": {"type": "ephemeral", "ttl": "1h"}}
Combine with prompt_cache_key and prompt_cache_retention: "24h" on the root request for predictable cache routing. Cache read / write pricing is model-specific — check model_spec.pricing on /models.
Tools & function calling
Function tools
{
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
},
"strict": true
}
}],
"tool_choice": "auto"
}
tool_choicecan also be"required","none", or{"type":"function","function":{"name":"get_weather"}}.parallel_tool_calls: true(default) lets the model emit multiple calls at once.- Respond by appending
{"role":"tool","tool_call_id":"...","content":"..."}before the next call.
Built-in tools
"tools": [{"type": "web_search"}, {"type": "x_search"}]
Equivalent to toggling venice_parameters.enable_web_search / enable_x_search. x_search requires a model with supportsXSearch.
Reasoning models
On thinking models (GLM 5.1, Kimi K2.6, Claude Opus 4.7, GPT-5.4 Pro, …):
{
"model": "zai-org-glm-5-1",
"reasoning": {"effort": "medium", "summary": "auto"},
"venice_parameters": {"strip_thinking_response": false},
"messages": [...]
}
reasoning_effortis the OpenAI-compatible flat variant (takes precedence overreasoning.effort).- Reasoning models may return
reasoning_contentor structuredreasoning_details[]on the assistant message. Passreasoning_detailsback verbatim in the next turn — it encodes thought signatures for providers like Claude Opus 4.7 and GPT-5.4 Pro. - Use
venice_parameters.disable_thinking: trueto skip thinking entirely on supported models.
Structured output (response_format)
{
"response_format": {
"type": "json_schema",
"json_schema": {
"type": "object",
"properties": {"name": {"type": "string"}, "age": {"type": "number"}},
"required": ["name", "age"]
}
}
}
Prefer json_schema over the legacy json_object. Plain text is the default ({"type": "text"}).
E2EE (end-to-end encryption)
For models advertising supportsE2EE:
- Perform an HPKE / Noise handshake with Venice (see docs.venice.ai/e2ee).
- Send encrypted payload with the required E2EE request headers.
- Leave
venice_parameters.enable_e2eeat defaulttrue, or setfalseto fall back to TEE-only.
E2EE is not supported on /responses — use /chat/completions for encrypted inference.
Streaming
{"stream": true, "stream_options": {"include_usage": true}}
- Response is
text/event-stream. Each event isdata: {...chunk...}\n\n, terminated bydata: [DONE]. include_usage: trueadds a final chunk with token counts.- With
venice_parameters.include_search_results_in_stream: true, the first chunk carriesvenice_search_results.
Web-search answers
When enable_web_search is "auto" or "on", the response includes venice_parameters.web_search_citations[] where each entry has url, title, content (snippet), and date. Turn on enable_web_citations to have the model insert ^1^ superscripts inline.
Error handling specifics
402— insufficient balance. Bearer:INSUFFICIENT_BALANCE. x402:PAYMENT_REQUIREDwith structuredtopUpInstructionsandsiwxChallenge.422— prompt violates Venice or provider content policy. May includesuggested_prompt.413— payload too large (mostly vision/audio).429— rate limit. See/api_keys/rate_limitsandvenice-errors.
Common gotchas
max_tokensis deprecated — prefermax_completion_tokens.- Image URLs must be publicly reachable from Venice's network. Localhost / signed S3 URLs without public access fail.
- Audio inputs cannot be URLs — always base64.
- Single-image vision models drop older images on each turn; chain them into the last user message.
- For multi-turn with tools on Claude Opus 4.7, GPT-5.4 Pro, and similar, always round-trip
reasoning_detailsunchanged. parallel_tool_calls: truemeans you MUST be prepared to execute several tools in parallel before sending a singletool-role reply chain.character_slugreplaces the default Venice system prompt. Combine withinclude_venice_system_prompt: falsefor total control.
More from veniceai/skills
venice-audio-transcription
Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.
31venice-audio-speech
Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per family, output formats (mp3/opus/aac/flac/wav/pcm), streaming, prompt/emotion styling, temperature/top_p, and language hints.
30venice-audio-music
Async music / audio-track generation via Venice. Covers the /audio/quote + /audio/queue + /audio/retrieve + /audio/complete lifecycle, lyrics vs instrumental, voice selection, duration, language, speed, model capability probing, and webhook-free polling.
30venice-video
Generate and transcribe videos via Venice. Covers the async /video/quote + /video/queue + /video/retrieve + /video/complete loop, text-to-video, image-to-video, video-to-video (upscale), audio input, reference images, scene and element support, plus /video/transcriptions for YouTube URLs.
29venice-image-generate
Generate images with Venice. Covers POST /image/generate (Venice-native), POST /images/generations (OpenAI-compatible), GET /image/styles (style presets), request fields (prompt, dimensions, cfg_scale, seed, variants, style_preset, aspect_ratio, resolution, safe_mode, watermark), and response formats.
29venice-embeddings
Call POST /embeddings on Venice. Covers request shape (input, model, encoding_format, dimensions, user), OpenAI compatibility, response compression (gzip/br), and practical usage for retrieval, clustering, and RAG.
29