scylladb-data-modeling
ScyllaDB Data Modeling
CQL data modeling patterns and anti-patterns for ScyllaDB. Bad schema is the root cause of most ScyllaDB performance issues — no amount of cluster scaling can fix a fundamentally wrong data model.
When to Apply
Reference these guidelines when:
- Designing a new ScyllaDB schema from scratch
- Migrating from SQL/relational databases, MongoDB, or Cassandra to ScyllaDB
- Reviewing existing table designs for performance issues
- Troubleshooting slow queries, timeouts, or hot nodes
- Deciding how to structure primary keys (partition key + clustering columns)
- Modeling time-series, IoT, or event data
- Seeing large partition warnings in logs
- Encountering
ALLOW FILTERINGin queries or code reviews - Adding secondary indexes or materialized views
Key Principle
"Start from the queries, not from the entities."
This is ScyllaDB's core data modeling philosophy. Unlike relational databases where you normalize entities and then write queries against them, in ScyllaDB you:
- List your application's queries — every
SELECT,UPDATE, andDELETEyour application will run - Design one table per query pattern — each table's primary key is crafted to serve a specific query efficiently
- Accept denormalization — the same data may exist in multiple tables, each optimized for a different access pattern
In ScyllaDB, the partition key determines which node holds the data and the clustering columns determine the sort order within that partition. The primary key IS your access pattern.
Quick Reference
1. Anti-Patterns — 3 rules
- antipattern-allow-filtering —
ALLOW FILTERINGforces a full-scan. Consult this reference whenever you see it in a query, or when a query does not include the full partition key. - antipattern-large-partitions — Partitions that grow without bounds cause memory pressure, slow reads, and compaction issues. Consult when designing time-series tables or any table where rows accumulate per partition key.
- antipattern-hot-partitions — Uneven partition key distribution causes some nodes/shards to be overloaded while others sit idle. Consult when choosing partition keys for high-write workloads.
2. Fundamentals — 3 rules
- query-first-design — Design tables from queries, not entities. The methodology for translating application access patterns into CQL table definitions.
- partition-key-design — How to choose partition keys that distribute data evenly and support your queries. Includes composite partition keys.
- clustering-columns — How clustering columns control sort order within a partition, and how to use them for range queries and time-ordered data.
3. Patterns — 2 rules
- pattern-bucketing — Split unbounded partitions into bounded time buckets (e.g., one partition per day/hour). Essential for time-series and IoT data.
- secondary-indexes-and-mv — When to use secondary indexes, local secondary indexes, and materialized views — and when to prefer a denormalized table instead.
Primary Key Structure
PRIMARY KEY ((partition_key_col1, partition_key_col2), clustering_col1, clustering_col2)
|___________________________________| |__________________________________|
Partition key Clustering columns
(determines node/shard) (determines sort order within
the partition)
Rules
- The partition key MUST appear in every
WHEREclause — queries that don't specify the full partition key requireALLOW FILTERING(a full-scan) or a secondary index - Clustering columns define sort order —
ASCby default, configurable withWITH CLUSTERING ORDER BY - Partition key determines data distribution — all rows with the same partition key live on the same node/shard
How to Use
Each reference file listed above contains detailed explanations and CQL examples. Use the descriptions in the Quick Reference to identify which files are relevant to your current task.
Each reference file contains:
- Brief explanation of why it matters
- Incorrect CQL example with explanation
- Correct CQL example with explanation
- "When NOT to use" exceptions
- Performance impact
Comparison with SQL / MongoDB
| Concept | SQL | MongoDB | ScyllaDB |
|---|---|---|---|
| Schema design | Entity-first (normalize) | Document-first (embed) | Query-first (denormalize per query) |
| Data unit | Row in a table | Document in a collection | Row in a partition |
| Joins | JOIN clause |
$lookup aggregation |
Not supported — denormalize instead |
| Flexible schema | No (strict DDL) | Yes (schema-optional) | No (strict CQL DDL) |
| Index required for queries | Often optional | Recommended | Partition key required in every query |
| Unique constraint | UNIQUE |
unique: true index |
Primary key only |
| Transactions | Full ACID | Multi-document ACID | Lightweight Transactions (LWT) for single-partition linearizable ops |
More from scylladb/agent-skills
scylladb-cloud-setup
Guide users through connecting to a ScyllaDB Cloud cluster. Use this skill when a user needs to connect to ScyllaDB Cloud, configure driver credentials, or troubleshoot connection issues. Triggers on "connect to ScyllaDB Cloud", "ScyllaDB connection", "ScyllaDB driver setup", "CQL connection", "DC-aware load balancing", "ScyllaDB credentials", "connection bundle".
10scylladb-vector-search
|
10