spice-caching
Spice Caching
Configure in-memory caching for SQL query results, search results, and embeddings in the Spice runtime.
Overview
Spice caches results from SQL queries (/v1/sql), search (/v1/search), and embeddings requests. All three caches are enabled by default with a 1-second TTL and 128 MiB max size. Caching applies to HTTP and Arrow Flight APIs.
Configuration
Caching is configured under runtime.caching in spicepod.yaml:
version: v1
kind: Spicepod
name: app
runtime:
caching:
sql_results:
enabled: true
max_size: 1GiB # Default 128MiB
item_ttl: 1m # Default 1s
eviction_policy: lru # lru | tiny_lfu
hashing_algorithm: xxh3
cache_key_type: plan # plan | sql
encoding: none # none | zstd
stale_while_revalidate_ttl: 30s # Default 0s (disabled)
search_results:
enabled: true
max_size: 1GiB
item_ttl: 1m
eviction_policy: lru
embeddings:
enabled: true
max_size: 128MiB
item_ttl: 1m
Common Parameters (All Cache Types)
| Parameter | Default | Description |
|---|---|---|
enabled |
true |
Enable/disable the cache |
max_size |
128MiB |
Maximum cache size |
eviction_policy |
lru |
lru (Least Recently Used) or tiny_lfu (higher hit rate for skewed access) |
item_ttl |
1s |
Cache entry TTL (Time to Live) |
hashing_algorithm |
xxh3 |
Hash for cache keys: xxh3, ahash, siphash, blake3, xxh32, xxh64, xxh128 |
SQL Results Extra Parameters
| Parameter | Default | Description |
|---|---|---|
cache_key_type |
plan |
plan = logical plan (matches semantically equivalent queries); sql = raw SQL string (faster, exact match only) |
encoding |
none |
none or zstd (compresses cached results, 50-90% reduction) |
stale_while_revalidate_ttl |
0s |
Serve stale entries while refreshing in background. 0s = disabled |
Choosing Parameters
cache_key_type
plan(default): Matches semantically equivalent queries even with different SQL syntax. Requires query parsing overhead.sql: Faster lookups, exact string match. Avoid with dynamic functions likeNOW().
eviction_policy
lru(default): Good general-purpose policy.tiny_lfu: Better hit rate when some queries are accessed much more frequently than others.
encoding
none(default): Zero compression overhead, uses more memory.zstd: High compression (50-90% reduction) with fast decompression. Use for large result sets.
hashing_algorithm
xxh3(default): Fastest general-purpose.ahash/xxh64/xxh128: Lower collision probability for many cached queries.blake3: Cryptographic security required.siphash: Protection against hash-flooding DoS attacks.
Stale-While-Revalidate
When stale_while_revalidate_ttl is set to a non-zero value:
- Cache entries are served normally until
item_ttlexpires. - After
item_ttlexpires but beforeitem_ttl + stale_while_revalidate_ttl, the stale entry is served immediately withSTALEstatus. - A background task refreshes the cache entry.
- After
item_ttl + stale_while_revalidate_ttl, the entry is evicted.
runtime:
caching:
sql_results:
enabled: true
item_ttl: 10s
stale_while_revalidate_ttl: 10s
# Fresh for 10s → Stale (served while refreshing) for 10s → Evicted
Conflict warning: When using
refresh_mode: cachingon a dataset, do not configure bothruntime.caching.sql_results.stale_while_revalidate_ttlandacceleration.params.caching_stale_while_revalidate_ttlfor the same dataset. Choose one approach.
Cache Control Headers
HTTP API
Use the standard Cache-Control header with /v1/sql and /v1/search:
| Directive | Description |
|---|---|
no-cache |
Skip cache for this request; cache the result for future requests |
min-fresh=N |
Require cached entry to remain fresh for at least N seconds |
max-stale=N |
Accept stale responses up to N seconds old |
only-if-cached |
Return only cached responses; error on cache miss |
stale-if-error=N |
Serve stale cache (up to N seconds) if fetching fresh data fails |
# Skip cache for this query
curl -H "cache-control: no-cache" -XPOST http://localhost:8090/v1/sql -d 'SELECT 1'
# Only accept fresh results (at least 30s remaining)
curl -H "cache-control: min-fresh=30" -XPOST http://localhost:8090/v1/sql -d 'SELECT 1'
# Accept stale up to 60s
curl -H "cache-control: max-stale=60" -XPOST http://localhost:8090/v1/sql -d 'SELECT 1'
# Only return if cached
curl -H "cache-control: only-if-cached" -XPOST http://localhost:8090/v1/sql -d 'SELECT 1'
Spice CLI
spice sql --cache-control no-cache
spice sql --cache-control min-fresh=30
spice sql --cache-control max-stale=60
spice sql --cache-control only-if-cached
spice search --cache-control no-cache
Arrow FlightSQL
Set cache-control in request metadata:
let mut request = FlightDescriptor::new_cmd(sql_command_bytes).into_request();
request.metadata_mut().insert("cache-control", "no-cache");
JDBC:
Properties props = new Properties();
props.setProperty("cache-control", "no-cache");
Connection conn = DriverManager.getConnection("jdbc:arrow-flight-sql://localhost:50051", props);
Custom Cache Keys
Set the Spice-Cache-Key header to share cache entries across semantically equivalent but syntactically different queries. Valid keys: up to 128 alphanumeric characters plus - and _. Custom keys take precedence over cache_key_type.
# First query — cache MISS
curl -XPOST http://localhost:8090/v1/sql \
-H "spice-cache-key: users_spiceai" \
-d "select * from users where org_id = 1;"
# Different query, same cache key — cache HIT
curl -XPOST http://localhost:8090/v1/sql \
-H "spice-cache-key: users_spiceai" \
-d "select * from users where split_part(email, '@', 2) = 'spice.ai';"
Warning: Ensure queries sharing a cache key are truly semantically equivalent. The runtime will return the cached result regardless of the actual query.
Response Headers
Responses include a header indicating cache status:
| Cache Type | Response Header |
|---|---|
sql_results |
Results-Cache-Status |
search_results |
Search-Results-Cache-Status |
| Status | Meaning |
|---|---|
HIT |
Served from cache |
MISS |
Cache checked, result not found |
BYPASS |
Cache bypassed (e.g., cache-control: no-cache) |
STALE |
Stale entry served while revalidating |
| (absent) | Cache did not apply (disabled or system table query) |
Monitoring / Metrics
Cache metrics are available at the Prometheus-compatible metrics endpoint. Prefix by cache type: results_*, search_results_*, embeddings_*.
| Metric | Type | Description |
|---|---|---|
*_cache_max_size_bytes |
Gauge | Configured max cache size |
*_cache_requests |
Counter | Total cache lookups |
*_cache_hits |
Counter | Total cache hits |
*_cache_items_count |
Gauge | Current items in cache |
*_cache_size_bytes |
Gauge | Current cache size |
*_cache_evictions |
Counter | Total evictions |
*_cache_hit_ratio |
Gauge | Hit ratio (hits / total) |
Common Recipes
High-throughput Dashboard (Maximize Hit Rate)
runtime:
caching:
sql_results:
item_ttl: 30s
max_size: 2GiB
eviction_policy: tiny_lfu
encoding: zstd
stale_while_revalidate_ttl: 30s
Low-Latency API (Exact Queries, Fast Lookups)
runtime:
caching:
sql_results:
item_ttl: 5s
cache_key_type: sql
hashing_algorithm: xxh3
Disable Caching Entirely
runtime:
caching:
sql_results:
enabled: false
search_results:
enabled: false
embeddings:
enabled: false
Troubleshooting
| Issue | Solution |
|---|---|
Always getting MISS |
Check item_ttl is long enough; verify cache_key_type (plan matches equivalent queries, sql requires exact strings) |
| Cache filling up quickly | Increase max_size, enable zstd encoding, or reduce item_ttl |
| Stale data being served | Reduce item_ttl or stale_while_revalidate_ttl; use cache-control: no-cache for specific queries |
Dynamic functions (NOW()) returning cached results |
Switch to cache_key_type: plan or use cache-control: no-cache |
| SWR conflict error | Don't set both runtime.caching.sql_results.stale_while_revalidate_ttl and acceleration.params.caching_stale_while_revalidate_ttl for the same dataset |