venice-embeddings
Venice Embeddings
POST /api/v1/embeddings returns vector embeddings for strings. It's OpenAI-compatible: the request and response match https://api.openai.com/v1/embeddings closely enough that the OpenAI SDK works out of the box with baseURL: "https://api.venice.ai/api/v1".
Use when
- You're building retrieval / RAG / similarity search.
- You need text clustering, classification, deduplication, or reranking.
- You want Venice's "no-training, no-retention" stance on inference inputs — embeddings are generated and returned; the API does not publish E2EE semantics on
/embeddingsthe way it does on selected chat models.
Text-only. For image/multimodal signals, either run images through a vision chat model and embed the description, or pick a multimodal-capable embedding model from GET /models?type=embedding (the catalog changes; inspect model_spec on each row).
Minimal request
curl https://api.venice.ai/api/v1/embeddings \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept-Encoding: gzip, br" \
-d '{
"model": "text-embedding-bge-m3",
"input": "Why is the sky blue?"
}'
{
"object": "list",
"model": "text-embedding-bge-m3",
"data": [
{ "object": "embedding", "index": 0, "embedding": [0.0023, -0.0093, 0.0158, ...] }
],
"usage": { "prompt_tokens": 8, "total_tokens": 8 }
}
Request schema
| Field | Type | Notes |
|---|---|---|
model |
string | Required. Model ID from GET /models?type=embedding. |
input |
string | string[] | number[] | number[][] | Required. Single string, array of strings (≤ 2048 entries), or pre-tokenized arrays. |
encoding_format |
"float" | "base64" |
Default "float". Use "base64" for ~4× payload shrinkage; decode client-side. |
dimensions |
integer | Optional. Truncate output dimensions. Only meaningful when the model's model_spec.supportsCustomDimensions === true — behavior on non-supporting models is model-dependent; test a small call before relying on it. |
user |
string | Accepted for OpenAI compat. Discarded by Venice. |
input max tokens per string is capped at the model's model_spec.maxInputTokens (typically 8192). Batch arrays are capped at 2048 items. Venice returns one embedding per element, in order, with matching index.
Response headers & compression
Request Accept-Encoding: gzip, br. The response will include Content-Encoding accordingly. For long batches this matters — vectors are large.
For x402 auth, X-Balance-Remaining reports your remaining USDC credits.
Using the OpenAI SDK
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: process.env.VENICE_API_KEY,
baseURL: 'https://api.venice.ai/api/v1',
})
const res = await client.embeddings.create({
model: 'text-embedding-bge-m3',
input: ['first doc', 'second doc'],
})
const vec0 = res.data[0].embedding
Batch-embedding pattern
async function embedBatch(texts: string[], batchSize = 64) {
const out: number[][] = []
for (let i = 0; i < texts.length; i += batchSize) {
const slice = texts.slice(i, i + batchSize)
const res = await client.embeddings.create({
model: 'text-embedding-bge-m3',
input: slice,
encoding_format: 'float',
})
for (const row of res.data) out[i + row.index] = row.embedding
}
return out
}
- Keep batches ≤ model context limit total tokens.
- On
429, back off exponentially and halve the batch — seevenice-errors.
Choosing a model
Query GET /models?type=embedding for the current catalog. Each entry exposes:
model_spec.embeddingDimensions— native output dimension (e.g. 1024 for BGE-M3).model_spec.maxInputTokens— max tokens per input string.model_spec.supportsCustomDimensions— whetherdimensionscan truncate the output.model_spec.pricing.input.usd/.diem— cost per million input tokens.
Built-in options include text-embedding-bge-m3, text-embedding-bge-en-icl, text-embedding-qwen3-8b, text-embedding-qwen3-0-6b, text-embedding-multilingual-e5-large-instruct, text-embedding-3-small, text-embedding-3-large, gemini-embedding-2-preview, text-embedding-nemotron-embed-vl-1b-v2.
Always pin the model ID — cosine distances are not comparable across different embedding models.
Error handling
| Code | Meaning |
|---|---|
400 |
Validation error. Check details in the response for the exact field. |
401 |
Auth / Pro-only model. |
402 |
Insufficient balance. Bearer → INSUFFICIENT_BALANCE. x402 → structured PAYMENT_REQUIRED. |
415 |
Wrong Content-Type — must be application/json. |
429 |
Rate limited. |
500 |
Inference failed; retry with jitter. |
503 |
Model at capacity; retry later. |
Gotchas
dimensionsis only meaningful whenmodel_spec.supportsCustomDimensions === true. Behavior on other models is model-dependent — test with a small request before relying on it.inputmust not be empty; Venice rejects empty strings with400.- Whether the returned vectors are L2-normalized depends on the model — verify with
Math.hypot(...v) ≈ 1before assuming. - For RAG, store
modelalongside the vector so you can re-embed on upgrade.
More from veniceai/skills
venice-audio-transcription
Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.
30venice-video
Generate and transcribe videos via Venice. Covers the async /video/quote + /video/queue + /video/retrieve + /video/complete loop, text-to-video, image-to-video, video-to-video (upscale), audio input, reference images, scene and element support, plus /video/transcriptions for YouTube URLs.
29venice-audio-speech
Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per family, output formats (mp3/opus/aac/flac/wav/pcm), streaming, prompt/emotion styling, temperature/top_p, and language hints.
29venice-image-generate
Generate images with Venice. Covers POST /image/generate (Venice-native), POST /images/generations (OpenAI-compatible), GET /image/styles (style presets), request fields (prompt, dimensions, cfg_scale, seed, variants, style_preset, aspect_ratio, resolution, safe_mode, watermark), and response formats.
29venice-errors
Handle Venice API errors correctly. Covers the StandardError / DetailedError / ContentViolationError / X402InferencePaymentRequired body shapes, every meaningful status code (400, 401, 402, 403, 415, 422, 429, 500, 503, 504), the 402 PAYMENT-REQUIRED header used by x402 inference, 422 content-policy suggested_prompt retry pattern, 429 rate-limit headers, and an exponential-backoff retry strategy with idempotency.
28venice-audio-music
Async music / audio-track generation via Venice. Covers the /audio/quote + /audio/queue + /audio/retrieve + /audio/complete lifecycle, lyrics vs instrumental, voice selection, duration, language, speed, model capability probing, and webhook-free polling.
28