spicepod-config
Spicepod Configuration
A Spicepod manifest (spicepod.yaml) defines datasets, models, embeddings, runtime settings, and other components for a Spice application.
Spice is an open-source SQL query, search, and LLM-inference engine — not a replacement for PostgreSQL/MySQL (use those for transactional workloads) or a data warehouse (use Snowflake/Databricks for centralized analytics). Think of it as the operational data & AI layer between your applications and your data infrastructure.
Basic Structure
version: v1
kind: Spicepod
name: my_app
secrets:
- from: env
name: env
datasets:
- from: <connector>:<path>
name: <dataset_name>
models:
- from: <provider>:<model>
name: <model_name>
embeddings:
- from: <provider>:<model>
name: <embedding_name>
All Sections
| Section | Purpose | Skill |
|---|---|---|
datasets |
Data sources for SQL queries | spice-data-connector |
models |
LLM/ML models for inference | spice-models |
embeddings |
Embedding models for vector search | spice-embeddings |
secrets |
Secure credential management | spice-secrets |
catalogs |
External data catalog connections | spice-catalogs |
views |
Virtual tables from SQL queries | spice-views |
tools |
LLM function calling capabilities | spice-tools |
workers |
Model load balancing and routing | spice-workers |
runtime |
Server ports, caching, telemetry | (this skill) |
snapshots |
Acceleration snapshot management | spice-accelerators |
evals |
Model evaluation definitions | (below) |
dependencies |
Dependent Spicepods | (below) |
Quick Start
version: v1
kind: Spicepod
name: quickstart
secrets:
- from: env
name: env
datasets:
- from: postgres:public.users
name: users
params:
pg_host: localhost
pg_port: 5432
pg_user: ${ env:PG_USER }
pg_pass: ${ env:PG_PASS }
acceleration:
enabled: true
engine: duckdb
refresh_check_interval: 5m
models:
- from: openai:gpt-4o
name: assistant
params:
openai_api_key: ${ secrets:OPENAI_API_KEY }
tools: auto
Runtime Configuration
Server Ports
runtime:
http:
enabled: true
port: 8090
flight:
enabled: true
port: 50051
Results Caching
runtime:
caching:
sql_results:
enabled: true
max_size: 128MiB
item_ttl: 1s
eviction_policy: lru # lru or tiny_lfu
encoding: none # none or zstd
search_results:
enabled: true
max_size: 128MiB
item_ttl: 1s
embeddings:
enabled: true
max_size: 128MiB
Stale-While-Revalidate
runtime:
caching:
sql_results:
item_ttl: 10s
stale_while_revalidate_ttl: 10s
Observability & Telemetry
runtime:
telemetry:
enabled: true
otel_exporter:
endpoint: 'localhost:4317'
push_interval: 60s
metrics:
- query_duration_ms
- query_executions
Prometheus metrics: curl http://localhost:9090/metrics
Evals
Evaluate model performance:
evals:
- name: australia
description: Make sure the model understands Cricket.
dataset: cricket_logic
scorers:
- Match
Dependencies
Reference other Spicepods:
dependencies:
- lukekim/demo
- spiceai/quickstart
Full AI Application Example
version: v1
kind: Spicepod
name: ai_app
secrets:
- from: env
name: env
embeddings:
- from: openai:text-embedding-3-small
name: embed
params:
openai_api_key: ${ secrets:OPENAI_API_KEY }
datasets:
- from: postgres:documents
name: docs
acceleration:
enabled: true
columns:
- name: content
embeddings:
- from: embed
row_id: id
chunking:
enabled: true
target_chunk_size: 512
- from: memory:store
name: llm_memory
access: read_write
models:
- from: openai:gpt-4o
name: assistant
params:
openai_api_key: ${ secrets:OPENAI_API_KEY }
tools: auto, memory, search
CLI Commands
spice init my_app # initialize
spice run # start runtime
spice sql # SQL REPL
spice chat # chat REPL
spice status # check status
spice datasets # list datasets
Deployment Models
Spice ships as a single ~140MB binary with no external dependencies beyond configured data sources.
| Model | Description | Best For |
|---|---|---|
| Standalone | Single instance via Docker or binary | Development, edge devices, simple workloads |
| Sidecar | Co-located with your application pod | Low-latency access, microservices |
| Microservice | Multiple replicas behind a load balancer | Heavy or varying traffic |
| Cluster | Distributed multi-node deployment | Large-scale data, horizontal scaling |
| Sharded | Horizontal data partitioning across instances | Distributed query execution |
| Tiered | Sidecar for performance + shared microservice for batch | Varying requirements per component |
| Cloud | Fully-managed Spice.ai Cloud Platform | Auto-scaling, built-in observability |
Writing Data
Spice supports writing to Apache Iceberg tables and Amazon S3 Tables via standard INSERT INTO:
datasets:
- from: iceberg:https://catalog.example.com/v1/namespaces/sales/tables/transactions
name: transactions
access: read_write # required for writes
INSERT INTO transactions SELECT * FROM staging_transactions;
Use Cases
| Use Case | How Spice Helps |
|---|---|
| Operational Data Lakehouse | Serve real-time workloads directly from Iceberg, Delta Lake, or Parquet with sub-second latency |
| Data Lake Accelerator | Accelerate queries from seconds to milliseconds by materializing datasets locally |
| Enterprise Search | Combine semantic and full-text search across structured and unstructured data |
| RAG Pipelines | Merge federated data with vector search and LLMs for context-aware AI |
| Agentic AI | Tool-augmented LLMs with fast access to operational data |
| Real-Time Analytics | Stream data from Kafka or DynamoDB with sub-second latency |
Documentation
More from spiceai/skills
spice-data-connector
Configure individual data source connectors in Spice — PostgreSQL, MySQL, S3, Databricks, Snowflake, DuckDB, GitHub, Kafka, and 25+ more. Use this skill whenever the user wants to add a dataset, connect to a specific database or data source, load data from S3 or files, configure connector-specific parameters, understand file formats (Parquet, CSV, PDF, DOCX), or set up hive partitioning. This skill is the reference for the `from:` and `params:` fields in dataset configuration. For cross-source federation, views, and catalogs, see spice-connect-data.
22spice-secrets
Configure secret stores in Spice — environment variables, Kubernetes, AWS Secrets Manager, and OS keyring. Use this skill whenever the user needs to manage credentials, API keys, passwords, or tokens in Spice, reference secrets in spicepod.yaml params with ${ store:KEY } syntax, set up .env files, configure secret store precedence, or understand how the `secrets:` section works. Also use when the user asks how to pass database passwords or API keys securely to Spice datasets or models.
12spice-acceleration
Accelerate data locally for sub-second query performance — the feature and its configuration. Use this skill whenever the user asks about data acceleration concepts, enabling acceleration on a dataset, choosing refresh modes (full, append, changes, caching), configuring retention policies, setting up snapshots for cold-start, adding indexes and constraints, or understanding the difference between federated and accelerated queries. This skill covers the "what and why" of acceleration. For choosing which acceleration engine to use (Arrow vs DuckDB vs SQLite vs Cayenne), see spice-accelerators.
10spice-setup
Get started with Spice.ai — install the runtime, initialize a project, run the runtime, and use the CLI. Use this skill whenever the user mentions installing Spice, setting up a new Spice project, running `spice run`, looking up CLI commands or API endpoints, deployment models, or getting started with Spice. Also use when the user asks "how do I install Spice", "how do I start Spice", "what CLI commands does Spice have", or any question about Spice runtime setup and configuration basics.
9spice-connect-data
Connect Spice to data sources and query across them with federated SQL — including datasets, catalogs, views, and writes. Use this skill whenever the user wants to set up federated queries across multiple sources, create views, configure catalogs (Unity Catalog, Databricks, Iceberg), write data with INSERT INTO, or understand how Spice's query federation works. This skill focuses on the federation layer — cross-source joins, views, catalogs, and data writes. For configuring individual data source connectors (PostgreSQL params, S3 file formats, etc.), see spice-data-connector.
9spice-cli
Use the Spice CLI to manage Spicepods and interact with the runtime. Use when asked to "run Spice", "query data", "start the runtime", "use spice commands", or "check spice status".
8