spice-setup
Getting Started with Spice
Spice is an open-source SQL query, search, and LLM-inference engine written in Rust. It federates queries across 30+ data sources, accelerates data locally, and integrates search and AI — all configured declaratively in YAML.
Spice is not a replacement for PostgreSQL/MySQL (use those for transactional workloads) or a data warehouse (use Snowflake/Databricks for centralized analytics). Think of it as the operational data & AI layer between your applications and your data infrastructure.
Install
macOS / Linux / WSL
curl https://install.spiceai.org | /bin/bash
Homebrew
brew install spiceai/spiceai/spice
Windows (PowerShell)
iex ((New-Object System.Net.WebClient).DownloadString("https://install.spiceai.org/Install.ps1"))
Verify & Upgrade
spice version
spice upgrade
If command not found, add to PATH: export PATH="$PATH:$HOME/.spice/bin"
Quick Start
spice init my_app
cd my_app
spice run
In another terminal:
spice sql
sql> show tables;
Spicepod Configuration (spicepod.yaml)
The Spicepod manifest defines all components for a Spice application:
version: v1
kind: Spicepod
name: my_app
secrets:
- from: env
name: env
datasets:
- from: <connector>:<path>
name: <dataset_name>
models:
- from: <provider>:<model>
name: <model_name>
embeddings:
- from: <provider>:<model>
name: <embedding_name>
All Sections
| Section | Purpose | Skill |
|---|---|---|
datasets |
Data sources for SQL queries | spice-connect-data |
models |
LLM/ML models for inference | spice-ai |
embeddings |
Embedding models for vector search | spice-search |
secrets |
Secure credential management | spice-secrets |
catalogs |
External data catalog connections | spice-connect-data |
views |
Virtual tables from SQL queries | spice-connect-data |
tools |
LLM function calling capabilities | spice-ai |
workers |
Model load balancing and routing | spice-ai |
runtime |
Server ports, caching, telemetry | spice-caching |
snapshots |
Acceleration snapshot management | spice-acceleration |
evals |
Model evaluation definitions | spice-ai |
dependencies |
Dependent Spicepods | (below) |
Dependencies
dependencies:
- lukekim/demo
- spiceai/quickstart
CLI Commands
| Command | Description |
|---|---|
spice init <name> |
Initialize a new Spicepod |
spice run |
Start the Spice runtime |
spice sql |
Start interactive SQL REPL |
spice chat |
Start chat REPL (requires model) |
spice search |
Perform embeddings-based search |
spice add <spicepod> |
Add a Spicepod dependency |
spice datasets |
List loaded datasets |
spice models |
List loaded models |
spice catalogs |
List loaded catalogs |
spice status |
Show runtime status |
spice refresh <dataset> |
Refresh an accelerated dataset |
spice login |
Login to the Spice.ai Platform |
spice version |
Show CLI and runtime version |
spice upgrade |
Upgrade CLI to latest version |
Runtime Endpoints
| Service | Default Address | Protocol |
|---|---|---|
| HTTP API | http://127.0.0.1:8090 |
REST, OpenAI-compatible |
| Arrow Flight | 127.0.0.1:50051 |
Arrow Flight / Flight SQL |
| Metrics | 127.0.0.1:9090 |
Prometheus |
| OpenTelemetry | 127.0.0.1:50052 |
OTLP gRPC |
HTTP API Paths
| Path | Description |
|---|---|
POST /v1/sql |
Execute SQL query |
POST /v1/search |
Embeddings-based search |
POST /v1/nsql |
Natural language to SQL |
POST /v1/chat/completions |
OpenAI-compatible chat |
POST /v1/embeddings |
Generate embeddings |
GET /v1/datasets |
List datasets |
GET /v1/models |
List models |
GET /health |
Health check |
Deployment Models
Spice ships as a single ~140MB binary with no external dependencies.
| Model | Best For |
|---|---|
| Standalone | Development, edge devices, simple workloads |
| Sidecar | Low-latency access, microservices |
| Microservice | Heavy or varying traffic behind a load balancer |
| Cluster | Large-scale data, horizontal scaling |
| Cloud | Auto-scaling, built-in observability (Spice.ai Cloud) |
Use Cases
| Use Case | How Spice Helps |
|---|---|
| Operational Data Lakehouse | Serve real-time workloads from Iceberg/Delta/Parquet with sub-second latency |
| Data Lake Accelerator | Accelerate queries from seconds to milliseconds locally |
| Enterprise Search | Combine semantic and full-text search across data |
| RAG Pipelines | Federated data + vector search + LLMs |
| Agentic AI | Tool-augmented LLMs with fast data access |
| Real-Time Analytics | Stream from Kafka/DynamoDB with sub-second latency |
Full Example
version: v1
kind: Spicepod
name: ai_app
secrets:
- from: env
name: env
embeddings:
- from: openai:text-embedding-3-small
name: embed
params:
openai_api_key: ${ secrets:OPENAI_API_KEY }
datasets:
- from: postgres:public.users
name: users
params:
pg_host: localhost
pg_user: ${ env:PG_USER }
pg_pass: ${ env:PG_PASS }
acceleration:
enabled: true
engine: duckdb
refresh_check_interval: 5m
- from: memory:store
name: llm_memory
access: read_write
models:
- from: openai:gpt-4o
name: assistant
params:
openai_api_key: ${ secrets:OPENAI_API_KEY }
tools: auto, memory, search
Documentation
More from spiceai/skills
spice-data-connector
Configure individual data source connectors in Spice — PostgreSQL, MySQL, S3, Databricks, Snowflake, DuckDB, GitHub, Kafka, and 25+ more. Use this skill whenever the user wants to add a dataset, connect to a specific database or data source, load data from S3 or files, configure connector-specific parameters, understand file formats (Parquet, CSV, PDF, DOCX), or set up hive partitioning. This skill is the reference for the `from:` and `params:` fields in dataset configuration. For cross-source federation, views, and catalogs, see spice-connect-data.
22spicepod-config
Create and configure Spicepod manifests (spicepod.yaml) — the central configuration file for Spice applications. Use this skill whenever the user wants to create a new spicepod.yaml from scratch, understand the overall spicepod structure and available sections, configure runtime settings (ports, caching, telemetry/observability), set up a complete Spice application combining datasets + models + search, or understand deployment models and use cases. This is the "glue" skill that shows how all Spice components fit together in one manifest. For details on specific sections (datasets, models, search, etc.), see the dedicated skills.
16spice-secrets
Configure secret stores in Spice — environment variables, Kubernetes, AWS Secrets Manager, and OS keyring. Use this skill whenever the user needs to manage credentials, API keys, passwords, or tokens in Spice, reference secrets in spicepod.yaml params with ${ store:KEY } syntax, set up .env files, configure secret store precedence, or understand how the `secrets:` section works. Also use when the user asks how to pass database passwords or API keys securely to Spice datasets or models.
12spice-acceleration
Accelerate data locally for sub-second query performance — the feature and its configuration. Use this skill whenever the user asks about data acceleration concepts, enabling acceleration on a dataset, choosing refresh modes (full, append, changes, caching), configuring retention policies, setting up snapshots for cold-start, adding indexes and constraints, or understanding the difference between federated and accelerated queries. This skill covers the "what and why" of acceleration. For choosing which acceleration engine to use (Arrow vs DuckDB vs SQLite vs Cayenne), see spice-accelerators.
10spice-connect-data
Connect Spice to data sources and query across them with federated SQL — including datasets, catalogs, views, and writes. Use this skill whenever the user wants to set up federated queries across multiple sources, create views, configure catalogs (Unity Catalog, Databricks, Iceberg), write data with INSERT INTO, or understand how Spice's query federation works. This skill focuses on the federation layer — cross-source joins, views, catalogs, and data writes. For configuring individual data source connectors (PostgreSQL params, S3 file formats, etc.), see spice-data-connector.
9spice-cli
Use the Spice CLI to manage Spicepods and interact with the runtime. Use when asked to "run Spice", "query data", "start the runtime", "use spice commands", or "check spice status".
8