Spice Model Providers

Model providers enable LLM chat completions and inference through a unified OpenAI-compatible API.

Basic Configuration

models:
  - from: <provider>:<model_id>
    name: <model_name>
    params:
      <provider>_api_key: ${ secrets:API_KEY }
      tools: auto # optional: enable runtime tools
      system_prompt: | # optional: default system prompt
        You are a helpful assistant.

Supported Providers

Provider	From Format	Status
OpenAI (or compatible)	`openai:gpt-4o`	Stable
Anthropic	`anthropic:claude-sonnet-4-5`	Alpha
Azure OpenAI	`azure:my-deployment`	Alpha
Google AI	`google:gemini-pro`	Alpha
xAI	`xai:grok-beta`	Alpha
Perplexity	`perplexity:sonar-pro`	Alpha
Amazon Bedrock	`bedrock:anthropic.claude-3`	Alpha
Databricks	`databricks:llama-3-70b`	Alpha
Spice.ai	`spiceai:llama3`	Release Candidate
HuggingFace	`hf:meta-llama/Llama-3-8B-Instruct`	Release Candidate
Local file	`file:./models/llama.gguf`	Release Candidate

Features

Feature	Description
Tools	SQL, search, memory, MCP, websearch
System Prompts	Declarative default system prompts
Parameterized Prompts	Jinja templating in system prompts
Parameter Overrides	Temperature, response format, etc.
Memory	Persistent memory across conversations
Evals	Evaluate and track model performance
Local Serving	CUDA/Metal accelerated local models

Examples

OpenAI with Tools

models:
  - from: openai:gpt-4o
    name: gpt4
    params:
      openai_api_key: ${ secrets:OPENAI_API_KEY }
      tools: auto

OpenAI-Compatible Provider (e.g., Groq)

models:
  - from: openai:llama3-groq-70b-8192-tool-use-preview
    name: groq-llama
    params:
      endpoint: https://api.groq.com/openai/v1
      openai_api_key: ${ secrets:GROQ_API_KEY }

Model with Memory

datasets:
  - from: memory:store
    name: llm_memory
    access: read_write

models:
  - from: openai:gpt-4o
    name: assistant
    params:
      openai_api_key: ${ secrets:OPENAI_API_KEY }
      tools: memory, sql

With System Prompt and Parameter Overrides

models:
  - from: openai:gpt-4o
    name: pirate_haikus
    params:
      system_prompt: |
        Write everything in Haiku like a pirate.
      openai_temperature: 0.1
      openai_response_format: "{ 'type': 'json_object' }"

Local Model (GGUF)

models:
  - from: file:./models/llama-3.gguf
    name: local_llama

Using Models

Chat Completions API (OpenAI-compatible)

curl http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt4",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Existing applications using OpenAI SDKs can swap endpoints without code changes.

NSQL (Text-to-SQL)

The /v1/nsql endpoint converts natural language to SQL and executes it. Spice uses tools like table_schema, random_sample, and sample_distinct_columns to help models write accurate, contextual SQL:

curl -XPOST "http://localhost:8090/v1/nsql" \
  -H "Content-Type: application/json" \
  -d '{"query": "What was the highest tip any passenger gave?"}'

CLI

spice chat
chat> Hello!

spice-models