venice-ai-api
Venice.ai API Skill
Venice.ai provides privacy-first AI infrastructure with uncensored models and zero data retention. The API is OpenAI-compatible, allowing use of the OpenAI SDK with Venice's base URL. Inference runs on a decentralized network (DePIN) where nodes are disincentivized from retaining user data.
Quick Reference
Base URL: https://api.venice.ai/api/v1
Auth: Authorization: Bearer VENICE_API_KEY
SDK: Use OpenAI SDK with custom base URL
API Key Types: ADMIN (full access) or INFERENCE (inference only)
Setup
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("VENICE_API_KEY"),
base_url="https://api.venice.ai/api/v1"
)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.VENICE_API_KEY,
baseURL: 'https://api.venice.ai/api/v1'
});
Account Tiers
| Tier | Qualification | Rate Limits | Use Case |
|---|---|---|---|
| Explorer | Pro subscription | Low RPM/TPM (~15-25 req/day) | Testing, prototyping |
| Paid | USD balance or staked VVV (Diems) | Standard production limits | Commercial apps |
| Partner | Enterprise agreement | Custom high-volume | Enterprise SaaS |
API Capabilities
1. Chat Completions
Text inference with multimodal support (text, images, audio, video).
completion = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello!"}
]
)
Popular Models:
llama-3.3-70b- Balanced performance (Tier M, 128K context)zai-org-glm-4.7- Complex tasks, deep reasoning (Tier L, 128K context)mistral-31-24b- Vision + function calling (Tier S, 131K context)venice-uncensored- No content filtering (Tier S, 32K context)deepseek-ai-DeepSeek-R1- Advanced reasoning, math, coding (Tier L, 64K context)qwen3-235b- Massive MoE reasoning (Tier L)qwen3-4b- Fast, lightweight (Tier XS, 40K context)
Venice Parameters (via extra_body in Python, direct in JS):
enable_web_search: "off" | "on" | "auto"enable_web_scraping: booleanenable_web_citations: boolean — adds^index^citation formatinclude_venice_system_prompt: boolean (default: true)strip_thinking_response: booleandisable_thinking: booleancharacter_slug: stringprompt_cache_key: string — routing hint for cache hitsprompt_cache_retention: "default" | "extended" | "24h"
See references/chat-completions.md for full parameter reference.
2. Image Generation
Generate images from text prompts.
import requests
response = requests.post(
"https://api.venice.ai/api/v1/image/generate",
headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}"},
json={
"model": "venice-sd35",
"prompt": "A sunset over mountains",
"width": 1024,
"height": 1024
}
)
# Response contains base64 images in images array
Image Models:
| Model | Best For | Pricing |
|---|---|---|
qwen-image |
Highest quality, editing | Variable |
venice-sd35 |
General purpose (default) | ~$0.01/image |
hidream |
Fast generation | ~$0.01/image |
flux-2-pro |
Professional quality | ~$0.04/image |
flux-2-max |
High-quality output | ~$0.02/image |
nano-banana-pro |
Photorealism, 2K/4K support | $0.18-$0.35 |
3. Image Upscaling
Enhance image resolution 2x or 4x.
import base64
with open("image.jpg", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode("utf-8")
response = requests.post(
"https://api.venice.ai/api/v1/image/upscale",
headers={"Authorization": f"Bearer {api_key}"},
json={
"image": image_base64,
"scale": 4 # 2 or 4
}
)
# Returns raw image binary
with open("upscaled.png", "wb") as f:
f.write(response.content)
Pricing: $0.02 (2x), $0.08 (4x)
4. Image Editing (Inpainting)
Modify existing images with AI-powered instructions.
import base64
with open("photo.jpg", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode("utf-8")
response = requests.post(
"https://api.venice.ai/api/v1/image/edit",
headers={"Authorization": f"Bearer {api_key}"},
json={
"prompt": "Change the sky to a sunset",
"image": image_base64 # or URL starting with http/https
}
)
# Returns raw image binary
with open("edited.png", "wb") as f:
f.write(response.content)
Model: Uses Qwen-Image. Pricing: ~$0.04/edit.
See references/image-api.md for all parameters and style presets.
5. Video Generation
Async queue-based video generation. Always call /video/quote first for pricing.
Full Workflow:
import requests
import time
import base64
api_key = os.getenv("VENICE_API_KEY")
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
# Step 1: Get price quote
quote = requests.post(
"https://api.venice.ai/api/v1/video/quote",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"duration": "10s",
"resolution": "720p",
"aspect_ratio": "16:9",
"audio": True
}
)
print(f"Estimated cost: ${quote.json()['quote']}")
# Step 2: Queue the job (text-to-video)
queue_resp = requests.post(
"https://api.venice.ai/api/v1/video/queue",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"prompt": "A serene forest with sunlight filtering through trees",
"negative_prompt": "low quality, blurry",
"duration": "10s",
"resolution": "720p",
"aspect_ratio": "16:9",
"audio": True
}
)
queue_id = queue_resp.json()["queueid"]
# Step 3: Poll until complete
while True:
status_resp = requests.post(
"https://api.venice.ai/api/v1/video/retrieve",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"queueid": queue_id,
"delete_media_on_completion": False
}
)
if (status_resp.status_code == 200
and status_resp.headers.get("Content-Type") == "video/mp4"):
with open("output.mp4", "wb") as f:
f.write(status_resp.content)
print("Video saved!")
break
else:
status = status_resp.json()
print(f"Status: {status['status']}, Duration: {status['executionDuration']}ms")
time.sleep(10)
# Step 4: Cleanup (optional — deletes from Venice storage)
requests.post(
"https://api.venice.ai/api/v1/video/complete",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"queueid": queue_id
}
)
Image-to-Video:
with open("image.png", "rb") as f:
img_b64 = base64.b64encode(f.read()).decode("utf-8")
queue_resp = requests.post(
"https://api.venice.ai/api/v1/video/queue",
headers=headers,
json={
"model": "wan-2.5-preview-image-to-video",
"prompt": "Animate this scene with gentle motion",
"image_url": f"data:image/png;base64,{img_b64}",
"duration": "5s",
"resolution": "720p"
}
)
Video Models:
| Model | Type | Features |
|---|---|---|
kling-2.5-turbo-pro |
Text/Image-to-Video | Fast, high quality |
wan-2.5-preview |
Image-to-Video | Animation specialist |
ltx-2-full |
Text/Image-to-Video | Full quality |
veo3-fast |
Text/Image-to-Video | Speed-optimized |
sora-2 |
Image-to-Video | High-end quality |
See references/video-api.md for full parameter reference.
6. Text-to-Speech
Convert text to audio with 60+ voices.
response = requests.post(
"https://api.venice.ai/api/v1/audio/speech",
headers={"Authorization": f"Bearer {api_key}"},
json={
"input": "Hello, welcome to Venice.",
"model": "tts-kokoro",
"voice": "af_sky",
"speed": 1.0, # 0.25 to 4.0
"response_format": "mp3" # mp3, opus, aac, flac, wav, pcm
}
)
with open("speech.mp3", "wb") as f:
f.write(response.content)
Voices: af_sky, af_nova, am_liam, bf_emma, zf_xiaobei, jm_kumo, and 50+ more.
Pricing: $3.50 per 1M characters.
7. Speech-to-Text
Transcribe audio files.
with open("audio.mp3", "rb") as f:
response = requests.post(
"https://api.venice.ai/api/v1/audio/transcriptions",
headers={"Authorization": f"Bearer {api_key}"},
files={"file": f},
data={
"model": "nvidia/parakeet-tdt-0.6b-v3",
"response_format": "json", # json or text
"timestamps": "true"
}
)
Formats: WAV, FLAC, MP3, M4A, AAC, MP4. Pricing: $0.0001 per audio second.
8. Embeddings
Generate vector embeddings for RAG and semantic search.
response = requests.post(
"https://api.venice.ai/api/v1/embeddings",
headers={"Authorization": f"Bearer {api_key}"},
json={
"model": "text-embedding-bge-m3",
"input": "Privacy-first AI infrastructure",
"encoding_format": "float" # or "base64"
}
)
9. Vision (Multimodal)
Analyze images with vision-capable models.
response = client.chat.completions.create(
model="mistral-31-24b",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://..."}}
]
}]
)
10. Function Calling
Define tools for the model to call.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}]
response = client.chat.completions.create(
model="zai-org-glm-4.7",
messages=[{"role": "user", "content": "Weather in SF?"}],
tools=tools
)
11. Structured Outputs
Get guaranteed JSON schema responses.
response = client.chat.completions.create(
model="venice-uncensored",
messages=[...],
response_format={
"type": "json_schema",
"json_schema": {
"name": "my_response",
"strict": True,
"schema": {
"type": "object",
"properties": {"answer": {"type": "string"}},
"required": ["answer"],
"additionalProperties": False
}
}
}
)
Requirements: strict: true, additionalProperties: false, all fields in required.
12. AI Characters
Interact with predefined AI personas.
# List characters
characters = requests.get(
"https://api.venice.ai/api/v1/characters",
headers={"Authorization": f"Bearer {api_key}"},
params={"categories": "philosophy", "limit": 50}
).json()
# Chat with a character
response = client.chat.completions.create(
model="venice-uncensored",
messages=[{"role": "user", "content": "What is the meaning of life?"}],
extra_body={
"venice_parameters": {"character_slug": "alan-watts"}
}
)
13. Model Discovery
Query available models and capabilities programmatically.
# List models by type
models = requests.get(
"https://api.venice.ai/api/v1/models",
headers={"Authorization": f"Bearer {api_key}"},
params={"type": "text"} # text, image, audio, video, embedding
).json()
# Get model traits for auto-selection
traits = requests.get(
"https://api.venice.ai/api/v1/models/traits",
params={"type": "text"}
).json()
# e.g. {"default": "zai-org-glm-4.7", "fastest": "qwen3-4b", "uncensored": "venice-uncensored"}
# Use trait as model ID for automatic routing
response = client.chat.completions.create(
model="fastest", # Venice routes to the current fastest model
messages=[...]
)
Error Handling
Error Codes
| Status | Error Code | Meaning | Action |
|---|---|---|---|
| 400 | INVALID_REQUEST |
Bad parameters | Check payload schema |
| 401 | AUTHENTICATION_FAILED |
Invalid API key | Verify key and balance |
| 402 | — | Insufficient balance | Add USD or stake VVV |
| 403 | — | Unauthorized access | Check key type (ADMIN vs INFERENCE) |
| 413 | — | Payload too large | Reduce request size |
| 415 | — | Invalid content type | Use application/json |
| 422 | — | Content policy violation | Modify prompt |
| 429 | RATE_LIMIT_EXCEEDED |
Too many requests | Backoff, wait for reset |
| 500 | INFERENCE_FAILED |
Model error | Retry with backoff |
| 503 | — | Model at capacity | Retry later or switch model |
| 504 | — | Timeout | Use streaming for long responses |
Abuse Protection
Sending >20 failed requests in 30 seconds triggers a 30-second IP block. Always implement backoff.
Retry with Exponential Backoff (Python)
import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def create_venice_session():
"""Create a requests session with automatic retry and backoff."""
session = requests.Session()
retry = Retry(
total=3,
backoff_factor=1, # 1s, 2s, 4s
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["POST", "GET"]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount("https://", adapter)
return session
session = create_venice_session()
response = session.post(url, json=payload, headers=headers)
Retry with Exponential Backoff (JavaScript)
async function veniceRequest(url, options, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.ok) return response;
if ([429, 500, 502, 503, 504].includes(response.status)) {
if (attempt < maxRetries) {
const delay = Math.pow(2, attempt) * 1000;
console.log(`Retry ${attempt + 1} in ${delay}ms (status ${response.status})`);
await new Promise(r => setTimeout(r, delay));
continue;
}
}
throw new Error(`Venice API error: ${response.status} ${response.statusText}`);
}
}
Rate Limit-Aware Client (Python)
import time
import requests
class VeniceClient:
"""Wrapper that respects rate limits using response headers."""
def __init__(self, api_key):
self.api_key = api_key
self.base_url = "https://api.venice.ai/api/v1"
self.session = create_venice_session()
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
def request(self, method, path, **kwargs):
resp = self.session.request(
method, f"{self.base_url}{path}",
headers=self.headers, **kwargs
)
remaining = resp.headers.get("x-ratelimit-remaining-requests")
if remaining and int(remaining) <= 1:
reset = resp.headers.get("x-ratelimit-reset-requests")
if reset:
wait = max(0, float(reset) - time.time())
time.sleep(wait)
resp.raise_for_status()
return resp
Response Headers
Monitor these headers for production:
x-ratelimit-remaining-requests— Requests left in windowx-ratelimit-remaining-tokens— Tokens left in windowx-ratelimit-reset-requests— Timestamp when request count resetsx-venice-balance-usd— USD balancex-venice-balance-diem— DIEM balancex-venice-is-blurred— Image was blurred (safe mode)x-venice-is-content-violation— Content policy violationx-venice-model-deprecation-warning— Deprecation noticex-venice-model-deprecation-date— Sunset dateCF-RAY— Request ID for support
Rate Limits by Model Tier
Text Models:
| Tier | RPM | TPM | Example Models |
|---|---|---|---|
| XS | 500 | 1,000,000 | qwen3-4b, llama-3.2-3b |
| S | 75 | 750,000 | mistral-31-24b, venice-uncensored |
| M | 50 | 750,000 | llama-3.3-70b, qwen3-next-80b |
| L | 20 | 500,000 | zai-org-glm-4.7, deepseek-ai-DeepSeek-R1 |
Other Endpoints:
| Endpoint | RPM |
|---|---|
| Image Generation | 20 |
| Audio Synthesis | 60 |
| Audio Transcription | 60 |
| Embeddings | 500 |
| Video Queue | 40 |
| Video Retrieve | 120 |
API Key Management
# Create key programmatically (requires ADMIN key)
curl -X POST https://api.venice.ai/api/v1/api_keys \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"apiKeyType": "INFERENCE", "description": "My App", "consumptionLimit": {"usd": 100}}'
# Check rate limits and balance
curl https://api.venice.ai/api/v1/api_keys/rate_limits \
-H "Authorization: Bearer $VENICE_API_KEY"
# List keys
curl https://api.venice.ai/api/v1/api_keys \
-H "Authorization: Bearer $VENICE_API_KEY"
# Delete key
curl -X DELETE "https://api.venice.ai/api/v1/api_keys?id={key_id}" \
-H "Authorization: Bearer $VENICE_API_KEY"
Reference Files
- references/chat-completions.md — Full chat API parameters
- references/image-api.md — Image generation, editing, upscaling details
- references/video-api.md — Video generation workflow and parameters
- references/models.md — Available models, tiers, pricing, and capabilities
More from jrajasekera/claude-skills
pandoc-converter
Convert documents between formats using Pandoc. Use when the user asks to convert files between formats like markdown, docx, html, pdf, latex, epub, rtf, csv, xlsx, or pptx. Triggers on requests like "convert this to Word", "export as PDF", "turn this markdown into HTML", or "convert the CSV to a table".
46openrouter-api
OpenRouter API integration for unified access to 400+ LLM models from 70+ providers. Use when building applications that need to call OpenRouter's API for chat completions, streaming, tool calling, structured outputs, or model routing. Triggers on OpenRouter, model routing, multi-model, provider fallbacks, or when users need to access multiple LLM providers through a single API.
18sqlite-optimization
Optimize SQLite database performance through configuration, schema design, indexing, and query tuning. Use when users ask to improve SQLite speed, reduce latency, optimize queries, configure PRAGMAs, fix slow queries, handle concurrency, optimize writes/inserts, or tune SQLite for production. Triggers on mentions of SQLite performance, slow queries, PRAGMA settings, WAL mode, indexing strategies, bulk inserts, or database maintenance (VACUUM, ANALYZE).
15z-ai-api
|
10codex-review
Use after creating design docs or implementation plans to get cross-agent review from Codex. Auto-triggers for non-trivial plans; asks first for simple changes. Captures feedback, addresses critical issues, presents minor concerns for user decision.
9article-extractor
Extract clean article content from URLs and save as markdown. Triggers when user provides a webpage URL and wants to download it, extract content, get a clean version without ads, capture an article for offline reading, save an article, grab content from a page, archive a webpage, clip an article, or read something later. Handles blog posts, news articles, tutorials, documentation pages, and similar web content. Supports Wayback Machine for dead links or paywalled content. This skill handles the entire workflow - do NOT use web_fetch or other tools first, just call the extraction script directly with the URL.
7