clickhouse-best-practices-ts-py
ClickHouse Best Practices (TypeScript, Python)
ClickHouse data modeling best practices extended with MooseStack TypeScript and Python examples. 28 rules across 3 categories (schema, query, insert), prioritized by impact.
MooseStack docs: MooseStack Documentation ClickHouse docs: ClickHouse Best Practices
IMPORTANT: How to Apply This Skill
Two modes — pick the right one for the situation:
Inline mode (default — use this during authoring / optimization)
When the user is actively building or iterating on code, apply rules inline:
- Identify which 1–2 rules are directly relevant to the current decision (e.g. choosing an ORDER BY, picking a type, deciding on batch size).
- Read only those rule file(s). Do not pre-emptively read 6+ rules.
- Apply the rule immediately in the code you're writing; cite the rule name in a brief comment or sentence (e.g. "Per
schema-pk-cardinality-order, ORDER BY user_id, event_ts."). - Keep moving — continue writing the pipeline, schema, or query. Do NOT stop to emit a "Rules Checked / Findings" audit.
Use the Quick Reference table and Rule Categories below to find the right rule fast.
Review mode (only when the user asks for a review/audit)
If the user explicitly says "review", "audit", "check for violations", or "what's wrong with this schema?" — and existing code is present to review — then follow the Review Procedures section below, read the listed rule files in order, and emit the structured Output Format.
Never enter review mode during greenfield authoring or time-boxed pipeline construction unless the user specifically asks for it. Writing working code first and checking rules along the way is almost always the right call.
Why rules take priority (either mode): ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated guidance with MooseStack-specific patterns for TypeScript and Python.
For Formal Reviews
When the user explicitly requests a review:
Review Procedures
For MooseStack Data Model Reviews (TypeScript/Python)
Read these rule files in order:
rules/schema-types-native-types.md- Type annotation mappingrules/schema-types-minimize-bitwidth.md- UseUInt8,UInt16etc.rules/schema-types-lowcardinality.md-string & LowCardinalityannotationrules/schema-types-avoid-nullable.md-ClickHouseDefaultinstead of?rules/schema-pk-cardinality-order.md-orderByFieldsorderingrules/schema-partition-lifecycle.md-partitionByFieldand TTL
Check for:
- Proper type annotations (
Key<T>,UInt64,LowCardinality, etc.) -
OlapTablehasorderByFieldsin low-to-high cardinality order - Optional fields use
ClickHouseDefaultinstead of?where possible -
enginespecified for update/delete patterns (ReplacingMergeTree, etc.) -
indexesfor non-ORDER BY filter columns
For Schema Reviews (CREATE TABLE, ALTER TABLE)
Read these rule files in order:
rules/schema-pk-plan-before-creation.md- ORDER BY is immutablerules/schema-pk-cardinality-order.md- Column ordering in keysrules/schema-pk-prioritize-filters.md- Filter column inclusionrules/schema-types-native-types.md- Proper type selectionrules/schema-types-minimize-bitwidth.md- Numeric type sizingrules/schema-types-lowcardinality.md- LowCardinality usagerules/schema-types-avoid-nullable.md- Nullable vs DEFAULTrules/schema-partition-low-cardinality.md- Partition count limitsrules/schema-partition-lifecycle.md- Partitioning purpose
Check for:
- PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
- Data types match actual data ranges
- LowCardinality applied to appropriate string columns
- Partition key cardinality bounded (100-1,000 values)
- ReplacingMergeTree has version column if used
For Query Reviews (SELECT, JOIN, aggregations)
Read these rule files:
rules/query-join-choose-algorithm.md- Algorithm selectionrules/query-join-filter-before.md- Pre-join filteringrules/query-join-use-any.md- ANY vs regular JOINrules/query-index-skipping-indices.md- Secondary index usagerules/schema-pk-filter-on-orderby.md- Filter alignment with ORDER BY
Check for:
- Filters use ORDER BY prefix columns
- JOINs filter tables before joining (not after)
- Correct JOIN algorithm for table sizes
- Skipping indices for non-ORDER BY filter columns
For Insert Strategy Reviews (data ingestion, updates, deletes)
Read these rule files:
rules/insert-batch-size.md- Batch sizing requirementsrules/insert-mutation-avoid-update.md- UPDATE alternativesrules/insert-mutation-avoid-delete.md- DELETE alternativesrules/insert-async-small-batches.md- Async insert usagerules/insert-optimize-avoid-final.md- OPTIMIZE TABLE risks
Check for:
- Batch size 10K-100K rows per INSERT
- No ALTER TABLE UPDATE for frequent changes
- ReplacingMergeTree or CollapsingMergeTree for update patterns
- Async inserts enabled for high-frequency small batches
Output Format
Structure your response as follows:
## Rules Checked
- `rule-name-1` - Compliant / Violation found
- `rule-name-2` - Compliant / Violation found
...
## Findings
### Violations
- **`rule-name`**: Description of the issue
- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]
### Compliant
- `rule-name`: Brief note on why it's correct
## Recommendations
[Prioritized list of changes, citing rules]
Rule Categories by Priority
| Priority | Category | Impact | Prefix | Rule Count |
|---|---|---|---|---|
| 1 | Primary Key Selection | CRITICAL | schema-pk- |
4 |
| 2 | Data Type Selection | CRITICAL | schema-types- |
5 |
| 3 | JOIN Optimization | CRITICAL | query-join- |
5 |
| 4 | Insert Batching | CRITICAL | insert-batch- |
1 |
| 5 | Mutation Avoidance | CRITICAL | insert-mutation- |
2 |
| 6 | Partitioning Strategy | HIGH | schema-partition- |
4 |
| 7 | Skipping Indices | HIGH | query-index- |
1 |
| 8 | Materialized Views | HIGH | query-mv- |
2 |
| 9 | Async Inserts | HIGH | insert-async- |
2 |
| 10 | OPTIMIZE Avoidance | HIGH | insert-optimize- |
1 |
| 11 | JSON Usage | MEDIUM | schema-json- |
1 |
Quick Reference
Schema Design - Primary Key (CRITICAL)
schema-pk-plan-before-creation- Plan ORDER BY before table creation (immutable)schema-pk-cardinality-order- Order columns low-to-high cardinalityschema-pk-prioritize-filters- Include frequently filtered columnsschema-pk-filter-on-orderby- Query filters must use ORDER BY prefix
Schema Design - Data Types (CRITICAL)
schema-types-native-types- Use native types, not String for everythingschema-types-minimize-bitwidth- Use smallest numeric type that fitsschema-types-lowcardinality- LowCardinality for <10K unique stringsschema-types-enum- Enum for finite value sets with validationschema-types-avoid-nullable- Avoid Nullable; use DEFAULT instead
Schema Design - Partitioning (HIGH)
schema-partition-low-cardinality- Keep partition count 100-1,000schema-partition-lifecycle- Use partitioning for data lifecycle, not queriesschema-partition-query-tradeoffs- Understand partition pruning trade-offsschema-partition-start-without- Consider starting without partitioning
Schema Design - JSON (MEDIUM)
schema-json-when-to-use- JSON for dynamic schemas; typed columns for known
Query Optimization - JOINs (CRITICAL)
query-join-choose-algorithm- Select algorithm based on table sizesquery-join-use-any- ANY JOIN when only one match neededquery-join-filter-before- Filter tables before joiningquery-join-consider-alternatives- Dictionaries/denormalization vs JOINquery-join-null-handling- join_use_nulls=0 for default values
Query Optimization - Indices (HIGH)
query-index-skipping-indices- Skipping indices for non-ORDER BY filters
Query Optimization - Materialized Views (HIGH)
query-mv-incremental- Incremental MVs for real-time aggregationsquery-mv-refreshable- Refreshable MVs for complex joins
Insert Strategy - Batching (CRITICAL)
insert-batch-size- Batch 10K-100K rows per INSERT
Insert Strategy - Async (HIGH)
insert-async-small-batches- Async inserts for high-frequency small batchesinsert-format-native- Native format for best performance
Insert Strategy - Mutations (CRITICAL)
insert-mutation-avoid-update- ReplacingMergeTree instead of ALTER UPDATEinsert-mutation-avoid-delete- Lightweight DELETE or DROP PARTITION
Insert Strategy - Optimization (HIGH)
insert-optimize-avoid-final- Let background merges work
When to Apply
This skill activates when you encounter:
MooseStack Triggers:
- MooseStack data model definitions (TypeScript interfaces or Python Pydantic models)
OlapTableconfigurationIngestPipelinesetupMaterializedViewdefinitionsApiquery handlers with ClickHouse queries- Type annotation questions (
Key,LowCardinality,UInt64, etc.)
ClickHouse Triggers:
CREATE TABLEstatementsALTER TABLEmodificationsORDER BYorPRIMARY KEYdiscussions- Data type selection questions
- Slow query troubleshooting
- JOIN optimization requests
- Data ingestion pipeline design
- Update/delete strategy questions
- ReplacingMergeTree or other specialized engine usage
- Partitioning strategy decisions
Rule File Structure
Each rule file in rules/ contains:
- YAML frontmatter: title, impact level, tags
- Brief explanation: Why this rule matters
- Incorrect example: Anti-pattern with SQL explanation
- Correct example: Best practice with SQL explanation
- MooseStack examples: TypeScript and Python code showing how to implement the pattern
- Data model type annotations
- OlapTable configuration
- API query patterns where relevant
- Additional context: Trade-offs, when to apply, references
Full Compiled Document
For the complete guide with all rules expanded inline: AGENTS.md
Use AGENTS.md when you need to check multiple rules quickly without reading individual files.
More from 514-labs/agent-skills
514-cli
>
5514-debug
>
4514-perf-optimize
>
4moose-basics
>
3moosestack-clickhouse-best-practices
MUST USE when reviewing MooseStack data models, ClickHouse schemas, queries, or configurations. Contains 28 rules with MooseStack TypeScript/Python examples. Always read relevant rule files and cite specific rules in responses.
3514-production-rollout-plan
>
1