venice-responses
Venice Responses API (Alpha)
POST /api/v1/responses is Venice's OpenAI-compatible Responses endpoint. It returns a structured, typed output array instead of a single message.content string — ideal for agents that need to separate reasoning, messages, tool calls, and built-in tool events.
Alpha. Access is gated behind the
responsesApiEnabledflag on Bearer API keys (staff-only during beta). x402 wallet auth bypasses this flag — you can pay per request without the flag. Schemas may change.
Use when
- You need the OpenAI Responses-style response shape (
output[]with typedtype: "reasoning" | "message" | "function_call" | "web_search_call"blocks) for a client library that expects it. - You want clean separation of reasoning vs message vs tool-call output.
- You want streaming via SSE with typed events.
Otherwise use venice-chat — it has more features, more models, and full Venice parameters.
Limitations vs /chat/completions
| Limitation | Detail |
|---|---|
| Stateless | No conversation persistence across requests. Send the full history each call. |
| E2EE models default to rejection | E2EE-capable models return 400 unless you pass venice_parameters.enable_e2ee: false (TEE-only mode). For end-to-end encrypted inference with E2EE headers, use /chat/completions. |
Subset of venice_parameters |
character_slug, enable_e2ee, enable_web_search, enable_web_scraping, enable_web_citations, include_venice_system_prompt, include_search_results_in_stream are supported. strip_thinking_response, disable_thinking, enable_x_search are not wired through in Alpha. |
| Access gated by feature flag | Bearer keys without responsesApiEnabled get 401. x402 requests are allowed (pay-per-call). |
Authentication
Same as the rest of the API — either Authorization: Bearer <key> or X-Sign-In-With-X: <SIWE>. See venice-auth.
Minimal request
curl https://api.venice.ai/api/v1/responses \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zai-org-glm-5-1",
"input": "Explain why the sky is blue in one paragraph."
}'
input accepts:
- a plain string, or
- an array of typed input items (similar to
chat/completionsmessage parts) for multi-turn or multimodal history.
Response shape
{
"id": "resp_abc123",
"object": "response",
"created_at": 1735689600,
"model": "zai-org-glm-5-1",
"status": "completed",
"output": [
{
"type": "reasoning",
"id": "rs_1",
"summary": ["I considered Rayleigh scattering..."],
"encrypted_content": "..."
},
{
"type": "message",
"id": "msg_1",
"status": "completed",
"role": "assistant",
"content": [{
"type": "output_text",
"text": "The sky is blue because...",
"annotations": [{
"type": "url_citation",
"url": "https://example.com/rayleigh",
"title": "Rayleigh scattering",
"start_index": 42,
"end_index": 99
}]
}]
},
{
"type": "function_call",
"id": "fc_1",
"call_id": "call_abc",
"name": "get_weather",
"arguments": "{\"city\":\"Paris\"}",
"status": "completed"
},
{
"type": "web_search_call",
"id": "ws_1",
"status": "completed"
}
],
"usage": {
"input_tokens": 20,
"input_tokens_details": {"cached_tokens": 0},
"output_tokens": 80,
"output_tokens_details": {"reasoning_tokens": 40},
"total_tokens": 100
}
}
Top-level status ∈ completed | failed | in_progress | cancelled. On failed, error.code and error.message are populated.
Output block types
type |
Purpose |
|---|---|
reasoning |
Thought process from reasoning models. summary[] holds human-readable text; encrypted_content holds opaque signatures — round-trip verbatim for multi-turn tool calls. |
message |
Main text output. content[].type === "output_text", plus annotations[] for url_citation entries from web search. |
function_call |
Tool call: name, stringified-JSON arguments, call_id. |
web_search_call |
Sentinel showing the built-in web_search tool fired; use alongside url_citation annotations on messages. |
Match tool outputs back by call_id when continuing the turn.
Common request fields
| Field | Notes |
|---|---|
model |
Required. Model ID, trait, or compatibility mapping. Feature suffixes allowed (see venice-chat). |
input |
Required. String or input-items array. To set system/developer context, include a leading message with role: "system"/"developer" in the input array. |
tools |
Array of {type:"function",function:{...}} or built-in {type:"web_search"} — availability depends on the model. |
tool_choice |
"auto" / "required" / "none" / {type:"function",function:{"name":"..."}}. |
reasoning.effort |
Reasoning effort hint for thinking models ("low" | "medium" | "high"). |
temperature, top_p, max_output_tokens, n, stop, seed, prompt_cache_key |
Standard generation controls — translated to /chat/completions equivalents server-side. |
stream |
Boolean. SSE response with typed events (response.created, response.output_item.added, response.output_text.delta, response.completed, …). |
venice_parameters |
Subset listed above. Example: {"character_slug":"alan-watts","enable_web_search":"on"}. |
Fields commonly found in OpenAI's Responses API that are not in Venice's Alpha schema (and silently ignored or rejected by Zod): instructions, metadata, parallel_tool_calls, response_format, store, previous_response_id, background. For response_format / JSON-schema structured output, use /chat/completions.
Streaming
With stream: true, the response is an SSE stream of typed events. Typical flow:
event: response.created
event: response.output_item.added # type=reasoning
event: response.reasoning.delta
event: response.output_item.added # type=message
event: response.content_part.added
event: response.output_text.delta
event: response.output_text.delta
event: response.output_item.done
event: response.completed
Consume events in order and reconstruct output[] client-side; the shape on response.completed matches the non-streamed response exactly.
Authentication & error responses
400— bad request; also returned when an E2EE-capable model is used withoutvenice_parameters.enable_e2ee: false.401— auth failed, or Bearer key lacksresponsesApiEnabled, or the model is Pro-only and you're on an INFERENCE key / x402 wallet.402— insufficient balance. Bearer →{ error: "INSUFFICIENT_BALANCE" }. x402 →PAYMENT_REQUIREDwithtopUpInstructionsandsiwxChallenge(seevenice-x402).429— rate-limited.500— inference failed.
X-Balance-Remaining is on 200 responses when using x402 auth; PAYMENT-REQUIRED header on 402.
Migration notes
- Port
messages→ pass asinput(string, or typed array with leading{role:"system"|"developer", content:"..."}). venice_parameters.character_slug→ supported; pass insidevenice_parametersor as a model feature suffix (:character_slug=alan-watts).venice_parameters.enable_web_search→ pass insidevenice_parameters, or append:enable_web_search=onto the model ID, or add{"type":"web_search"}totools.venice_parameters.strip_thinking_response/disable_thinking→ not supported on/responsesin Alpha; stay on/chat/completionsfor these.- Full E2EE flow (E2EE request headers + encrypted response) → stay on
/chat/completions. For TEE-only inference on an E2EE-capable model, passvenice_parameters.enable_e2ee: falsehere. response_format/ JSON-schema structured output → stay on/chat/completions.
More from veniceai/skills
venice-audio-transcription
Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.
29venice-video
Generate and transcribe videos via Venice. Covers the async /video/quote + /video/queue + /video/retrieve + /video/complete loop, text-to-video, image-to-video, video-to-video (upscale), audio input, reference images, scene and element support, plus /video/transcriptions for YouTube URLs.
28venice-audio-speech
Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per family, output formats (mp3/opus/aac/flac/wav/pcm), streaming, prompt/emotion styling, temperature/top_p, and language hints.
28venice-image-generate
Generate images with Venice. Covers POST /image/generate (Venice-native), POST /images/generations (OpenAI-compatible), GET /image/styles (style presets), request fields (prompt, dimensions, cfg_scale, seed, variants, style_preset, aspect_ratio, resolution, safe_mode, watermark), and response formats.
28venice-embeddings
Call POST /embeddings on Venice. Covers request shape (input, model, encoding_format, dimensions, user), OpenAI compatibility, response compression (gzip/br), and practical usage for retrieval, clustering, and RAG.
28venice-errors
Handle Venice API errors correctly. Covers the StandardError / DetailedError / ContentViolationError / X402InferencePaymentRequired body shapes, every meaningful status code (400, 401, 402, 403, 415, 422, 429, 500, 503, 504), the 402 PAYMENT-REQUIRED header used by x402 inference, 422 content-policy suggested_prompt retry pattern, 429 rate-limit headers, and an exponential-backoff retry strategy with idempotency.
27