venice-embeddings

Installation
SKILL.md

Venice Embeddings

POST /api/v1/embeddings returns vector embeddings for strings. It's OpenAI-compatible: the request and response match https://api.openai.com/v1/embeddings closely enough that the OpenAI SDK works out of the box with baseURL: "https://api.venice.ai/api/v1".

Use when

  • You're building retrieval / RAG / similarity search.
  • You need text clustering, classification, deduplication, or reranking.
  • You want Venice's "no-training, no-retention" stance on inference inputs — embeddings are generated and returned; the API does not publish E2EE semantics on /embeddings the way it does on selected chat models.

Text-only. For image/multimodal signals, either run images through a vision chat model and embed the description, or pick a multimodal-capable embedding model from GET /models?type=embedding (the catalog changes; inspect model_spec on each row).

Minimal request

curl https://api.venice.ai/api/v1/embeddings \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept-Encoding: gzip, br" \
  -d '{
    "model": "text-embedding-bge-m3",
    "input": "Why is the sky blue?"
  }'
{
  "object": "list",
  "model": "text-embedding-bge-m3",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.0023, -0.0093, 0.0158, ...] }
  ],
  "usage": { "prompt_tokens": 8, "total_tokens": 8 }
}

Request schema

Field Type Notes
model string Required. Model ID from GET /models?type=embedding.
input string | string[] | number[] | number[][] Required. Single string, array of strings (≤ 2048 entries), or pre-tokenized arrays.
encoding_format "float" | "base64" Default "float". Use "base64" for ~4× payload shrinkage; decode client-side.
dimensions integer Optional. Truncate output dimensions. Only meaningful when the model's model_spec.supportsCustomDimensions === true — behavior on non-supporting models is model-dependent; test a small call before relying on it.
user string Accepted for OpenAI compat. Discarded by Venice.

input max tokens per string is capped at the model's model_spec.maxInputTokens (typically 8192). Batch arrays are capped at 2048 items. Venice returns one embedding per element, in order, with matching index.

Response headers & compression

Request Accept-Encoding: gzip, br. The response will include Content-Encoding accordingly. For long batches this matters — vectors are large.

For x402 auth, X-Balance-Remaining reports your remaining USDC credits.

Using the OpenAI SDK

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: process.env.VENICE_API_KEY,
  baseURL: 'https://api.venice.ai/api/v1',
})

const res = await client.embeddings.create({
  model: 'text-embedding-bge-m3',
  input: ['first doc', 'second doc'],
})

const vec0 = res.data[0].embedding

Batch-embedding pattern

async function embedBatch(texts: string[], batchSize = 64) {
  const out: number[][] = []
  for (let i = 0; i < texts.length; i += batchSize) {
    const slice = texts.slice(i, i + batchSize)
    const res = await client.embeddings.create({
      model: 'text-embedding-bge-m3',
      input: slice,
      encoding_format: 'float',
    })
    for (const row of res.data) out[i + row.index] = row.embedding
  }
  return out
}
  • Keep batches ≤ model context limit total tokens.
  • On 429, back off exponentially and halve the batch — see venice-errors.

Choosing a model

Query GET /models?type=embedding for the current catalog. Each entry exposes:

  • model_spec.embeddingDimensions — native output dimension (e.g. 1024 for BGE-M3).
  • model_spec.maxInputTokens — max tokens per input string.
  • model_spec.supportsCustomDimensions — whether dimensions can truncate the output.
  • model_spec.pricing.input.usd / .diem — cost per million input tokens.

Built-in options include text-embedding-bge-m3, text-embedding-bge-en-icl, text-embedding-qwen3-8b, text-embedding-qwen3-0-6b, text-embedding-multilingual-e5-large-instruct, text-embedding-3-small, text-embedding-3-large, gemini-embedding-2-preview, text-embedding-nemotron-embed-vl-1b-v2.

Always pin the model ID — cosine distances are not comparable across different embedding models.

Error handling

Code Meaning
400 Validation error. Check details in the response for the exact field.
401 Auth / Pro-only model.
402 Insufficient balance. Bearer → INSUFFICIENT_BALANCE. x402 → structured PAYMENT_REQUIRED.
415 Wrong Content-Type — must be application/json.
429 Rate limited.
500 Inference failed; retry with jitter.
503 Model at capacity; retry later.

Gotchas

  • dimensions is only meaningful when model_spec.supportsCustomDimensions === true. Behavior on other models is model-dependent — test with a small request before relying on it.
  • input must not be empty; Venice rejects empty strings with 400.
  • Whether the returned vectors are L2-normalized depends on the model — verify with Math.hypot(...v) ≈ 1 before assuming.
  • For RAG, store model alongside the vector so you can re-embed on upgrade.
Related skills

More from veniceai/skills

Installs
28
Repository
veniceai/skills
GitHub Stars
65
First Seen
Apr 23, 2026