spice-caching
Spice Caching
Configure in-memory caching for SQL query results, search results, and embeddings in the Spice runtime.
Overview
Spice caches results from SQL queries (/v1/sql), search (/v1/search), and embeddings requests. All three caches are enabled by default with a 1-second TTL and 128 MiB max size. Caching applies to HTTP and Arrow Flight APIs.
Configuration
Caching is configured under runtime.caching in spicepod.yaml:
version: v1
kind: Spicepod
name: app
runtime:
caching:
sql_results:
enabled: true
max_size: 1GiB # Default 128MiB
item_ttl: 1m # Default 1s
eviction_policy: lru # lru | tiny_lfu
hashing_algorithm: xxh3
cache_key_type: plan # plan | sql
encoding: none # none | zstd
stale_while_revalidate_ttl: 30s # Default 0s (disabled)
search_results:
enabled: true
max_size: 1GiB
item_ttl: 1m
eviction_policy: lru
embeddings:
enabled: true
max_size: 128MiB
item_ttl: 1m
Common Parameters (All Cache Types)
| Parameter | Default | Description |
|---|---|---|
enabled |
true |
Enable/disable the cache |
max_size |
128MiB |
Maximum cache size |
eviction_policy |
lru |
lru (Least Recently Used) or tiny_lfu (higher hit rate for skewed access) |
item_ttl |
1s |
Cache entry TTL (Time to Live) |
hashing_algorithm |
xxh3 |
Hash for cache keys: xxh3, ahash, siphash, blake3, xxh32, xxh64, xxh128 |
SQL Results Extra Parameters
| Parameter | Default | Description |
|---|---|---|
cache_key_type |
plan |
plan = logical plan (matches semantically equivalent queries); sql = raw SQL string (faster, exact match only) |
encoding |
none |
none or zstd (compresses cached results, 50-90% reduction) |
stale_while_revalidate_ttl |
0s |
Serve stale entries while refreshing in background. 0s = disabled |
Choosing Parameters
cache_key_type
plan(default): Matches semantically equivalent queries even with different SQL syntax. Requires query parsing overhead.sql: Faster lookups, exact string match. Avoid with dynamic functions likeNOW().
eviction_policy
lru(default): Good general-purpose policy.tiny_lfu: Better hit rate when some queries are accessed much more frequently than others.
encoding
none(default): Zero compression overhead, uses more memory.zstd: High compression (50-90% reduction) with fast decompression. Use for large result sets.
hashing_algorithm
xxh3(default): Fastest general-purpose.ahash/xxh64/xxh128: Lower collision probability for many cached queries.blake3: Cryptographic security required.siphash: Protection against hash-flooding DoS attacks.
Stale-While-Revalidate
When stale_while_revalidate_ttl is set to a non-zero value:
- Cache entries are served normally until
item_ttlexpires. - After
item_ttlexpires but beforeitem_ttl + stale_while_revalidate_ttl, the stale entry is served immediately withSTALEstatus. - A background task refreshes the cache entry.
- After
item_ttl + stale_while_revalidate_ttl, the entry is evicted.
runtime:
caching:
sql_results:
enabled: true
item_ttl: 10s
stale_while_revalidate_ttl: 10s
# Fresh for 10s → Stale (served while refreshing) for 10s → Evicted
Conflict warning: When using
refresh_mode: cachingon a dataset, do not configure bothruntime.caching.sql_results.stale_while_revalidate_ttlandacceleration.params.caching_stale_while_revalidate_ttlfor the same dataset. Choose one approach.
Cache Control Headers
HTTP API
Use the standard Cache-Control header with /v1/sql and /v1/search:
| Directive | Description |
|---|---|
no-cache |
Skip cache for this request; cache the result for future requests |
min-fresh=N |
Require cached entry to remain fresh for at least N seconds |
max-stale=N |
Accept stale responses up to N seconds old |
only-if-cached |
Return only cached responses; error on cache miss |
stale-if-error=N |
Serve stale cache (up to N seconds) if fetching fresh data fails |
# Skip cache for this query
curl -H "cache-control: no-cache" -XPOST http://localhost:8090/v1/sql -d 'SELECT 1'
# Only accept fresh results (at least 30s remaining)
curl -H "cache-control: min-fresh=30" -XPOST http://localhost:8090/v1/sql -d 'SELECT 1'
# Accept stale up to 60s
curl -H "cache-control: max-stale=60" -XPOST http://localhost:8090/v1/sql -d 'SELECT 1'
# Only return if cached
curl -H "cache-control: only-if-cached" -XPOST http://localhost:8090/v1/sql -d 'SELECT 1'
Spice CLI
spice sql --cache-control no-cache
spice sql --cache-control min-fresh=30
spice sql --cache-control max-stale=60
spice sql --cache-control only-if-cached
spice search --cache-control no-cache
Arrow FlightSQL
Set cache-control in request metadata:
let mut request = FlightDescriptor::new_cmd(sql_command_bytes).into_request();
request.metadata_mut().insert("cache-control", "no-cache");
JDBC:
Properties props = new Properties();
props.setProperty("cache-control", "no-cache");
Connection conn = DriverManager.getConnection("jdbc:arrow-flight-sql://localhost:50051", props);
Custom Cache Keys
Set the Spice-Cache-Key header to share cache entries across semantically equivalent but syntactically different queries. Valid keys: up to 128 alphanumeric characters plus - and _. Custom keys take precedence over cache_key_type.
# First query — cache MISS
curl -XPOST http://localhost:8090/v1/sql \
-H "spice-cache-key: users_spiceai" \
-d "select * from users where org_id = 1;"
# Different query, same cache key — cache HIT
curl -XPOST http://localhost:8090/v1/sql \
-H "spice-cache-key: users_spiceai" \
-d "select * from users where split_part(email, '@', 2) = 'spice.ai';"
Warning: Ensure queries sharing a cache key are truly semantically equivalent. The runtime will return the cached result regardless of the actual query.
Response Headers
Responses include a header indicating cache status:
| Cache Type | Response Header |
|---|---|
sql_results |
Results-Cache-Status |
search_results |
Search-Results-Cache-Status |
| Status | Meaning |
|---|---|
HIT |
Served from cache |
MISS |
Cache checked, result not found |
BYPASS |
Cache bypassed (e.g., cache-control: no-cache) |
STALE |
Stale entry served while revalidating |
| (absent) | Cache did not apply (disabled or system table query) |
Monitoring / Metrics
Cache metrics are available at the Prometheus-compatible metrics endpoint. Prefix by cache type: results_*, search_results_*, embeddings_*.
| Metric | Type | Description |
|---|---|---|
*_cache_max_size_bytes |
Gauge | Configured max cache size |
*_cache_requests |
Counter | Total cache lookups |
*_cache_hits |
Counter | Total cache hits |
*_cache_items_count |
Gauge | Current items in cache |
*_cache_size_bytes |
Gauge | Current cache size |
*_cache_evictions |
Counter | Total evictions |
*_cache_hit_ratio |
Gauge | Hit ratio (hits / total) |
Common Recipes
High-throughput Dashboard (Maximize Hit Rate)
runtime:
caching:
sql_results:
item_ttl: 30s
max_size: 2GiB
eviction_policy: tiny_lfu
encoding: zstd
stale_while_revalidate_ttl: 30s
Low-Latency API (Exact Queries, Fast Lookups)
runtime:
caching:
sql_results:
item_ttl: 5s
cache_key_type: sql
hashing_algorithm: xxh3
Disable Caching Entirely
runtime:
caching:
sql_results:
enabled: false
search_results:
enabled: false
embeddings:
enabled: false
Troubleshooting
| Issue | Solution |
|---|---|
Always getting MISS |
Check item_ttl is long enough; verify cache_key_type (plan matches equivalent queries, sql requires exact strings) |
| Cache filling up quickly | Increase max_size, enable zstd encoding, or reduce item_ttl |
| Stale data being served | Reduce item_ttl or stale_while_revalidate_ttl; use cache-control: no-cache for specific queries |
Dynamic functions (NOW()) returning cached results |
Switch to cache_key_type: plan or use cache-control: no-cache |
| SWR conflict error | Don't set both runtime.caching.sql_results.stale_while_revalidate_ttl and acceleration.params.caching_stale_while_revalidate_ttl for the same dataset |
More from spiceai/skills
spice-data-connector
Configure individual data source connectors in Spice — PostgreSQL, MySQL, S3, Databricks, Snowflake, DuckDB, GitHub, Kafka, and 25+ more. Use this skill whenever the user wants to add a dataset, connect to a specific database or data source, load data from S3 or files, configure connector-specific parameters, understand file formats (Parquet, CSV, PDF, DOCX), or set up hive partitioning. This skill is the reference for the `from:` and `params:` fields in dataset configuration. For cross-source federation, views, and catalogs, see spice-connect-data.
22spicepod-config
Create and configure Spicepod manifests (spicepod.yaml) — the central configuration file for Spice applications. Use this skill whenever the user wants to create a new spicepod.yaml from scratch, understand the overall spicepod structure and available sections, configure runtime settings (ports, caching, telemetry/observability), set up a complete Spice application combining datasets + models + search, or understand deployment models and use cases. This is the "glue" skill that shows how all Spice components fit together in one manifest. For details on specific sections (datasets, models, search, etc.), see the dedicated skills.
16spice-models
Configure AI/LLM model providers and connections in Spice — OpenAI, Anthropic, Azure, Google, xAI, Bedrock, Perplexity, Databricks, HuggingFace, and local GGUF models. Use this skill whenever the user wants to add a model, configure a specific LLM provider, set up an OpenAI-compatible endpoint (e.g. Groq, Ollama), serve a local model, configure system prompts, set parameter overrides (temperature, response format), or understand which providers are available. This skill is the model connector reference. For AI features like tools, memory, workers, and NSQL, see spice-ai.
16spice-accelerators
Choose and configure the right acceleration engine — Arrow, DuckDB, SQLite, Cayenne, PostgreSQL, or Turso. Use this skill whenever the user needs to pick an accelerator engine, compare engines (e.g. "should I use DuckDB or Cayenne?"), configure engine-specific parameters (duckdb_file, sqlite_file), tune memory vs file mode, or understand engine capabilities and limitations. This skill is the engine selection and tuning guide. For the broader acceleration feature (refresh modes, retention, snapshots, indexes), see spice-acceleration.
15spice-secrets
Configure secret stores in Spice — environment variables, Kubernetes, AWS Secrets Manager, and OS keyring. Use this skill whenever the user needs to manage credentials, API keys, passwords, or tokens in Spice, reference secrets in spicepod.yaml params with ${ store:KEY } syntax, set up .env files, configure secret store precedence, or understand how the `secrets:` section works. Also use when the user asks how to pass database passwords or API keys securely to Spice datasets or models.
12spice-acceleration
Accelerate data locally for sub-second query performance — the feature and its configuration. Use this skill whenever the user asks about data acceleration concepts, enabling acceleration on a dataset, choosing refresh modes (full, append, changes, caching), configuring retention policies, setting up snapshots for cold-start, adding indexes and constraints, or understanding the difference between federated and accelerated queries. This skill covers the "what and why" of acceleration. For choosing which acceleration engine to use (Arrow vs DuckDB vs SQLite vs Cayenne), see spice-accelerators.
10