spice-accelerators
Spice Data Accelerators
Accelerators materialize data locally from connected sources for faster queries and reduced load on source systems.
Basic Configuration
datasets:
- from: postgres:my_table
name: my_table
acceleration:
enabled: true
engine: duckdb # arrow, duckdb, sqlite, cayenne, postgres, turso
mode: memory # memory or file
refresh_check_interval: 1h
Choosing an Accelerator
| Use Case | Engine | Why |
|---|---|---|
| Small datasets (<1 GB), max speed | arrow |
In-memory, lowest latency |
| Medium datasets (1-100 GB), complex SQL | duckdb |
Mature SQL, memory management |
| Large datasets (100 GB-1+ TB), analytics | cayenne |
Built on Vortex (Linux Foundation), 10-20x faster scans |
| Point lookups on large datasets | cayenne |
100x faster random access vs Parquet |
| Simple queries, low resource usage | sqlite |
Lightweight, minimal overhead |
| Async operations, concurrent workloads | turso |
Native async, modern connection pooling |
| External database integration | postgres |
Leverage existing PostgreSQL infra |
Cayenne vs DuckDB
Choose Cayenne when datasets exceed ~1 TB, multi-file ingestion is needed, or point lookups are common. Choose DuckDB when datasets are under ~1 TB, complex SQL (window functions, CTEs) is needed, or DuckDB tooling is beneficial.
Supported Engines
| Engine | Mode | Status |
|---|---|---|
arrow |
memory | Stable |
duckdb |
memory, file | Stable |
sqlite |
memory, file | Release Candidate |
cayenne |
file | Beta |
postgres |
N/A (attached) | Release Candidate |
turso |
memory, file | Beta |
Refresh Modes
| Mode | Description | Use Case |
|---|---|---|
full |
Complete dataset replacement on each refresh | Small, slowly-changing datasets |
append (batch) |
Adds new records based on a time_column |
Append-only logs, time-series data |
append (stream) |
Continuous streaming without time column | Real-time event streams (Kafka, Debezium) |
changes |
CDC-based incremental updates via Debezium or DynamoDB Streams | Frequently updated transactional data |
caching |
Request-based row-level caching | API responses, HTTP endpoints |
# Full refresh every 8 hours
acceleration:
refresh_mode: full
refresh_check_interval: 8h
# Append mode: check for new records from the last day every 10 minutes
acceleration:
refresh_mode: append
time_column: created_at
refresh_check_interval: 10m
refresh_data_window: 1d
# Continuous ingestion using Kafka
acceleration:
refresh_mode: append
# CDC with Debezium or DynamoDB Streams
acceleration:
refresh_mode: changes
Common Configurations
In-Memory with Interval Refresh
acceleration:
enabled: true
engine: arrow
refresh_check_interval: 5m
File-Based with Append and Time Window
datasets:
- from: postgres:events
name: events
time_column: created_at
acceleration:
enabled: true
engine: duckdb
mode: file
refresh_mode: append
refresh_check_interval: 1h
refresh_data_window: 7d
With Retention Policy
Retention policies prevent unbounded growth of accelerated datasets. Spice supports time-based and custom SQL-based retention strategies:
datasets:
- from: postgres:events
name: events
time_column: created_at
acceleration:
enabled: true
engine: duckdb
retention_check_enabled: true
retention_period: 30d
retention_check_interval: 1h
With SQL-Based Retention
acceleration:
retention_check_enabled: true
retention_check_interval: 1h
retention_sql: "DELETE FROM logs WHERE status = 'archived'"
With Indexes (DuckDB, SQLite, Turso)
acceleration:
enabled: true
engine: sqlite
indexes:
user_id: enabled
'(created_at, status)': unique
primary_key: id
Engine-Specific Parameters
DuckDB
acceleration:
engine: duckdb
mode: file
params:
duckdb_file: ./data/cache.db
SQLite
acceleration:
engine: sqlite
mode: file
params:
sqlite_file: ./data/cache.sqlite
Constraints and Indexes
Accelerated datasets support primary key constraints and indexes:
acceleration:
enabled: true
engine: duckdb
primary_key: order_id # Creates non-null unique index
indexes:
customer_id: enabled # Single column index
'(created_at, status)': unique # Multi-column unique index
Snapshots (DuckDB, SQLite & Cayenne file mode)
Bootstrap file-based accelerations from S3 or filesystem snapshots on startup. This dramatically reduces cold-start latency in distributed deployments.
Snapshot triggers vary by refresh mode:
refresh_complete: Creates snapshots after each refresh (full and batch-append modes)time_interval: Creates snapshots on a fixed schedule (all refresh modes)stream_batches: Creates snapshots after every N batches (streaming modes: Kafka, Debezium, DynamoDB Streams)
snapshots:
enabled: true
location: s3://my_bucket/snapshots/
bootstrap_on_failure_behavior: warn # warn | retry | fallback
params:
s3_auth: iam_role
Per-dataset opt-in:
acceleration:
enabled: true
engine: duckdb
mode: file
snapshots:
enabled: true
Memory Considerations
When using mode: memory (default), the dataset is loaded into RAM. Ensure sufficient memory including overhead for queries and the runtime. Mitigate with mode: file for duckdb, sqlite, turso, or cayenne accelerators.
Documentation
More from spiceai/skills
spice-data-connector
Configure individual data source connectors in Spice — PostgreSQL, MySQL, S3, Databricks, Snowflake, DuckDB, GitHub, Kafka, and 25+ more. Use this skill whenever the user wants to add a dataset, connect to a specific database or data source, load data from S3 or files, configure connector-specific parameters, understand file formats (Parquet, CSV, PDF, DOCX), or set up hive partitioning. This skill is the reference for the `from:` and `params:` fields in dataset configuration. For cross-source federation, views, and catalogs, see spice-connect-data.
22spice-models
Configure AI/LLM model providers and connections in Spice — OpenAI, Anthropic, Azure, Google, xAI, Bedrock, Perplexity, Databricks, HuggingFace, and local GGUF models. Use this skill whenever the user wants to add a model, configure a specific LLM provider, set up an OpenAI-compatible endpoint (e.g. Groq, Ollama), serve a local model, configure system prompts, set parameter overrides (temperature, response format), or understand which providers are available. This skill is the model connector reference. For AI features like tools, memory, workers, and NSQL, see spice-ai.
16spicepod-config
Create and configure Spicepod manifests (spicepod.yaml) — the central configuration file for Spice applications. Use this skill whenever the user wants to create a new spicepod.yaml from scratch, understand the overall spicepod structure and available sections, configure runtime settings (ports, caching, telemetry/observability), set up a complete Spice application combining datasets + models + search, or understand deployment models and use cases. This is the "glue" skill that shows how all Spice components fit together in one manifest. For details on specific sections (datasets, models, search, etc.), see the dedicated skills.
16spice-secrets
Configure secret stores in Spice — environment variables, Kubernetes, AWS Secrets Manager, and OS keyring. Use this skill whenever the user needs to manage credentials, API keys, passwords, or tokens in Spice, reference secrets in spicepod.yaml params with ${ store:KEY } syntax, set up .env files, configure secret store precedence, or understand how the `secrets:` section works. Also use when the user asks how to pass database passwords or API keys securely to Spice datasets or models.
12spice-acceleration
Accelerate data locally for sub-second query performance — the feature and its configuration. Use this skill whenever the user asks about data acceleration concepts, enabling acceleration on a dataset, choosing refresh modes (full, append, changes, caching), configuring retention policies, setting up snapshots for cold-start, adding indexes and constraints, or understanding the difference between federated and accelerated queries. This skill covers the "what and why" of acceleration. For choosing which acceleration engine to use (Arrow vs DuckDB vs SQLite vs Cayenne), see spice-accelerators.
10spice-setup
Get started with Spice.ai — install the runtime, initialize a project, run the runtime, and use the CLI. Use this skill whenever the user mentions installing Spice, setting up a new Spice project, running `spice run`, looking up CLI commands or API endpoints, deployment models, or getting started with Spice. Also use when the user asks "how do I install Spice", "how do I start Spice", "what CLI commands does Spice have", or any question about Spice runtime setup and configuration basics.
9