venice-models

Installation
SKILL.md

Venice Models

Three read-only endpoints for model discovery — all GET:

Endpoint Returns
/models Full model catalog with model_spec (capabilities, constraints, pricing).
/models/traits Trait → model ID mapping (e.g. "default", "fastest", "default_reasoning", "highest_quality").
/models/compatibility_mapping Legacy / OpenAI / third-party model ID → Venice model ID aliases.

All three take an optional ?type= filter: text, image, video, music, tts, asr, embedding, upscale, inpaint, all, code.

All three are authenticated (Bearer API key or x402 SIWE) like every other /api/v1 route.

Use when

  • You need to pick a model at runtime based on capabilities (vision, reasoning, function calling, E2EE, X search, multi-image, …).
  • You need to validate a request against a model's constraints (prompt length, aspect ratio, resolution, steps).
  • You need the current price per million tokens / per image / per second / per 1k chars to build a cost estimate.
  • You want to resolve a user-friendly trait name (e.g. default, default_reasoning, highest_quality) or a frontier-style ID (openai-gpt-54-pro, claude-opus-4-7) to a concrete Venice model ID.

GET /models

curl "https://api.venice.ai/api/v1/models?type=text"
{
  "object": "list",
  "type": "text",
  "data": [
    {
      "id": "zai-org-glm-5-1",
      "created": 1699000000,
      "model_spec": {
        "name": "GLM 5.1",
        "description": "Balanced blend of speed and capability...",
        "availableContextTokens": 200000,
        "maxCompletionTokens": 24000,
        "privacy": "private",
        "beta": false,
        "betaModel": false,
        "modelSource": "https://huggingface.co/zai-org/GLM-5.1",
        "offline": false,
        "capabilities": { ... },
        "constraints": { ... },
        "pricing": { ... },
        "regionRestrictions": ["US"],
        "deprecation": {"date": "2025-03-01T00:00:00.000Z"}
      }
    }
  ]
}

model_spec.capabilities — text models

Flag Meaning
optimizedForCode Tuned for coding tasks.
quantization fp4 / fp8 / fp16 / bf16 / int8 / int4 / not-available.
supportsFunctionCalling Tools are allowed.
supportsReasoning Emits <thinking>...</thinking> blocks (and/or provider-specific reasoning_content).
supportsReasoningEffort Honors reasoning.effort / reasoning_effort.
supportsResponseSchema Honors response_format: json_schema.
supportsMultipleImages + maxImages Multi-image vision support.
supportsVision Accepts image_url parts.
supportsVideoInput Accepts video_url parts.
supportsWebSearch Honors venice_parameters.enable_web_search.
supportsLogProbs Honors logprobs / top_logprobs.
supportsTeeAttestation Runs inside a TEE with hardware attestation.
supportsE2EE End-to-end encrypted inference available (requires TEE).
supportsXSearch xAI native web + X/Twitter search.
supportsAudioInput Accepts audio-content message parts (set by the runtime capability builder — not part of the OpenAPI strict schema but appears on /models responses).

model_spec.constraints — by model family

  • Texttemperature.default, top_p.default, and {frequency,presence,repetition}_penalty.default.
  • ImagepromptCharacterLimit, widthHeightDivisor, steps.{default,max}, optional aspectRatios[] + defaultAspectRatio, optional resolutions[] + defaultResolution (the last two appear only on models that use ratio/resolution-based sizing).
  • Videoaspect_ratios[], resolutions[], durations[], model_type (text-to-video/image-to-video/video), audio, audio_configurable, prompt_character_limit.
  • Inpaint / editaspectRatios[], promptCharacterLimit, combineImages.
  • TTS / Music (fields surface at the top level of model_spec, not inside constraints) — voices[], default_voice, supports_lyrics, lyrics_required, supports_lyrics_optimizer, supports_force_instrumental, supports_speed, supports_language_code, min_speed, max_speed, min_prompt_length, prompt_character_limit. Internal TTS per-model toggles like supportsPromptParam / supportsTemperatureParam / supportsTopPParam exist on the model definitions but are not merged into /models output today — treat the speech request schema as the support matrix.
  • Embedding (top-level, not inside constraints) — embeddingDimensions, maxInputTokens, supportsCustomDimensions.

model_spec.pricing — by model family

  • LLMinput.{usd,diem}, output.{usd,diem} per 1 000 000 tokens, plus optional cache_input (reads), cache_write (writes, e.g. Anthropic 1.25×), and extended.* tier triggered by context_token_threshold.
  • Image — either generation.{usd,diem} per image (flat) or resolutions.<tier>.{usd,diem} (per 1K/2K/4K). Every image row also carries a global upscale.{2x,4x}.{usd,diem} block (derived from shared upscale SKUs) — treat it as account-wide upscale pricing, not a signal that this specific model can upscale. Combine with the inpaint/upscale model's own capability check to decide what's actually callable.
  • Inpaint / editinpaint.{usd,diem} per edit.
  • Videonot currently returned on /models. calculatePricing() has no video branch, so video entries have no model_spec.pricing. Use POST /video/quote for the authoritative per-request price.
  • Music / long audiogeneration.{usd,diem} (per job), per_second.{usd,diem} (per second generated), per_thousand_characters.{usd,diem} (character-priced narration), or durations.<tier>.{usd,diem,min_seconds,max_seconds} (duration-bucketed).
  • TTSinput.{usd,diem} per 1 000 000 input characters.
  • ASRper_audio_second.{usd,diem}.
  • Embeddingsinput.{usd,diem} per 1 000 000 tokens.

Crypto RPC pricing is not in /models — it's tier × chain multipliers on /crypto/rpc/{network} (see venice-crypto-rpc).

Other top-level model_spec fields

Field Use
privacy (private / anonymized) Zero data retention if private.
beta / betaModel Gated to beta users (beta: true ⇒ need access).
offline Currently unavailable; skip.
regionRestrictions[] Country codes. 403 outside them.
deprecation.date Retirement date. Migrate before.

GET /models/traits

curl "https://api.venice.ai/api/v1/models/traits?type=text"

Returns { object: "list", type: "text", data: { "default": "zai-org-glm-5-1", "fastest": "grok-41-fast", "most_uncensored": "venice-uncensored", "default_reasoning": "...", "default_code": "...", "default_vision": "...", "function_calling_default": "...", "most_intelligent": "..." } } for type=text. For type=image, expect keys like default, fastest, highest_quality, eliza-default. Trait keys come from the internal ApiModelTraits / LLMApiModelTraits / ImageApiModelTraits enums.

Use this to avoid hard-coding model IDs — resolve a trait at boot and cache for the session.

GET /models/compatibility_mapping

curl "https://api.venice.ai/api/v1/models/compatibility_mapping?type=text"

Returns { object: "list", type: "text", data: { "openai-gpt-54-pro": "zai-org-glm-5-1", "claude-opus-4-7": "claude-opus-4-7", "gpt-5-4-pro": "openai-gpt-54-pro", ... } }. Both OpenAI-style IDs (openai-gpt-54-pro) and vendor-style aliases (gpt-5-4-pro) may appear as keys.

Lets an OpenAI-style client call Venice with its native model IDs — Venice substitutes behind the scenes. Useful when porting existing code.

Common patterns

Pick a vision+reasoning model at runtime

const list = await fetch(`${base}/models?type=text`).then(r => r.json())
const match = list.data.find((m: any) =>
  m.model_spec.capabilities.supportsVision &&
  m.model_spec.capabilities.supportsReasoning &&
  !m.model_spec.offline &&
  !m.model_spec.beta
)

Validate an image request before submit

const spec = (await fetch(`${base}/models?type=image`).then(r => r.json()))
  .data.find((m: any) => m.id === myModel)!.model_spec

const { widthHeightDivisor, promptCharacterLimit, aspectRatios } = spec.constraints
if (prompt.length > promptCharacterLimit) throw new Error('prompt too long')
if (width % widthHeightDivisor !== 0) throw new Error('width not divisible')
if (aspectRatios && !aspectRatios.includes(myAspect)) throw new Error('bad aspect')

Estimate LLM cost

const p = spec.pricing
const cost =
  (inputTokens / 1_000_000) * p.input.usd +
  (outputTokens / 1_000_000) * p.output.usd +
  (cachedTokens / 1_000_000) * (p.cache_input?.usd ?? 0)

For extended-context runs, check if inputTokens > p.extended?.context_token_threshold and switch to p.extended.* rates.

?type=code

type=code is a convenience filter returning text models with capabilities.optimizedForCode === true. Same response shape as type=text.

Gotchas

  • The catalog changes — cache for minutes, not days.
  • beta: true models require beta-flagged keys — otherwise 401 with "only available to Pro users".
  • offline: true means the model exists in the catalog but can't currently serve requests — treat it as absent for scheduling.
  • model_spec.pricing can be missing on free / internal models — guard against undefined.
  • traits differ by type — there's no "global default"; always pass ?type=....
  • compatibility_mapping resolves model IDs, not capabilities. If your caller sends openai-gpt-54-pro but needs vision, verify via the resolved Venice model's capabilities.supportsVision.
Related skills

More from veniceai/skills

Installs
25
Repository
veniceai/skills
GitHub Stars
65
First Seen
Apr 23, 2026