clickhouse
ClickHouse Database Expert
Comprehensive ClickHouse knowledge base for working with high-performance columnar OLAP databases.
When to Invoke This Skill
Use this skill when:
- Designing ClickHouse schemas (tables, partitions, ORDER BY)
- Choosing table engines (MergeTree family decision tree)
- Writing and optimizing ClickHouse SQL queries
- Managing ClickHouse clusters (replication, sharding)
- Debugging query performance or merge issues
- Setting up backups and monitoring
- Integrating ClickHouse with Kafka, S3, or other systems
- Operating ClickHouse on Kubernetes
What is ClickHouse?
ClickHouse is a columnar OLAP database designed for real-time analytics on large datasets.
Key Characteristics:
- Columnar storage: Read only needed columns (10-100x faster than row stores for analytical queries)
- MergeTree engine family: Automatic background merges for data organization
- SQL dialect with extensions: Arrays, tuples, lambdas, specialized functions
- Append-first design: Optimized for high-volume inserts, not point updates
Golden Rules
- Always use MergeTree (except tiny dimensions → Memory engine)
- Sort key = query filter: ORDER BY defines data layout on disk
- Partition by time: For TTL and efficient DROP PARTITION operations
- Avoid mutations: Use INSERT + new data instead of UPDATE/DELETE
- Monitor merges: Background merges impact performance significantly
Quick Start Examples
Minimal Working Schema
-- Basic events table with best practices
CREATE TABLE events (
timestamp DateTime,
user_id UInt32,
event_type LowCardinality(String),
session_id UUID,
metadata String,
revenue Decimal(18, 2) DEFAULT 0
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp) -- Monthly partitions
ORDER BY (user_id, timestamp) -- Data layout matches query pattern
SETTINGS index_granularity = 8192;
Common Query Patterns
-- Effective time range filter
SELECT * FROM events
WHERE timestamp >= today() AND timestamp < tomorrow();
-- Optimize JOIN (smaller table on RIGHT)
SELECT * FROM large_table RIGHT JOIN small_table ON large_table.id = small_table.id;
-- Check query plan
EXPLAIN SELECT * FROM events WHERE user_id = 123;
Monitoring Queries
-- Running queries
SELECT query_id, user, query, elapsed FROM system.processes ORDER BY elapsed DESC;
-- Table sizes
SELECT database, table, formatReadableSize(sum(bytes)) as size
FROM system.parts WHERE active = 1
GROUP BY database, table ORDER BY sum(bytes) DESC;
-- Active merges
SELECT table, elapsed, bytes_read_uncompressed FROM system.merges ORDER BY elapsed DESC;
Common Pitfalls
| Pitfall | Why It's Bad | Better Approach |
|---|---|---|
| Modifying columns (MODIFY/DROP) | Triggers expensive mutation | Use ADD COLUMN only |
| Updating/deleting rows | Mutations rewrite all data | Use TTL or new tables |
| Bad ORDER BY | Can't leverage index | Match query WHERE patterns |
| Too many partitions | Slow queries, high overhead | Aim for 100-1000 parts total |
| SELECT * | Reads all columns (columnar penalty) | Select only needed columns |
| String date comparison | Full scan | Use date functions on column |
ClickHouse Architecture Overview
Data Model
- Append-first: No in-place updates (mutations are expensive)
- Parts and partitions: Parts merge into larger parts (background process)
- Two-level index: Sparse index (8192 rows/mark) + mark files
When ClickHouse Shines
✅ Wide tables (100+ columns), read few columns ✅ Time-series with time-based filters ✅ Aggregations over billions of rows ✅ Append-only workloads (events, logs, metrics)
When to Avoid ClickHouse
❌ Point updates/deletes (use row store like PostgreSQL) ❌ Heavy JOINs on non-sorted keys ❌ Complex transactions (no ACID support) ❌ Low-latency OLTP (use row store)
Key Topics by Reference
Schema & Table Design
| Topic | Reference | Description |
|---|---|---|
| Core Concepts | references/core-concepts.md |
Architecture, data model, internals |
| Schema Design | references/schema-design.md |
Database engines, migrations, version control |
| Table Design | references/table-design.md |
ORDER BY, partitioning, column selection |
| Table Engines | references/table-engines.md |
Complete MergeTree family reference |
Query & Performance
| Topic | Reference | Description |
|---|---|---|
| SQL Reference | references/sql-reference.md |
Complete SQL dialect, data types |
| Query Optimization | references/query-optimization.md |
EXPLAIN, JOINs, projections, skip indexes |
| Advanced Features | references/advanced-features.md |
Materialized views, mutations, TTL, dictionaries |
Operations & Cluster
| Topic | Reference | Description |
|---|---|---|
| Debugging | references/debugging.md |
Query debugging, merges, mutations, replication |
| Cluster Management | references/cluster-management.md |
Distributed tables, replication, sharding |
| Kubernetes Operator | references/kubernetes-operator.md |
K8s deployment and operations |
| Backup & Restore | references/backup-restore.md |
Backup strategies, disaster recovery |
| Monitoring | references/monitoring.md |
Query monitoring, health checks, system queries |
Integration & Best Practices
| Topic | Reference | Description |
|---|---|---|
| Integrations | references/integrations.md |
Kafka, S3, PostgreSQL, MySQL, BI tools |
| Best Practices | references/best-practices.md |
Complete checklist, anti-patterns |
| External References | references/external.md |
Altinity KB links, official docs |
Quick Decision Guides
Which Table Engine?
Need to store data?
├── < 1M rows, dimension → Memory
└── ≥ 1M rows → MergeTree family
├── Deduplication? → ReplacingMergeTree(version)
├── Changelog? → CollapsingMergeTree(sign)
├── Pre-aggregation? → AggregatingMergeTree()
├── Replication? → ReplicatedMergeTree(...)
└── Default → MergeTree()
See references/table-engines.md for complete reference.
Common Issues & Quick Fixes
| Issue | Quick Fix |
|---|---|
| Too many parts | OPTIMIZE TABLE table FINAL |
| Slow query | EXPLAIN SELECT ... to check index usage |
| Mutation stuck | Check system.mutations, use OPTIMIZE FINAL |
| Replication lag | Check system.replication_queue, ZooKeeper |
| OOM on query | Increase max_memory_usage, optimize query |
See references/debugging.md for detailed troubleshooting.
See Also
Core References
references/core-concepts.md- Architecture, data model, MergeTree internalsreferences/schema-design.md- Database engines, schema organization, migrationsreferences/table-design.md- ORDER BY design, partitioning strategies, column selectionreferences/table-engines.md- Complete guide to all MergeTree family enginesreferences/sql-reference.md- Full SQL dialect with data types and functions
Query & Performance
references/query-optimization.md- EXPLAIN, JOIN optimization, projections, skip indexesreferences/advanced-features.md- Materialized views, mutations, TTL, dictionariesreferences/debugging.md- Query debugging, merge issues, replication problems
Operations
references/cluster-management.md- Distributed tables, replication, shardingreferences/kubernetes-operator.md- K8s deployment with Altinity operatorreferences/backup-restore.md- Backup strategies and disaster recoveryreferences/monitoring.md- Query monitoring, health checks, system queries
Integration & Best Practices
references/integrations.md- Kafka, S3, PostgreSQL, MySQL, BI toolsreferences/best-practices.md- Comprehensive checklist and anti-patternsreferences/external.md- Altinity KB (200+ articles) and official docs
System Queries
references/system-queries.md- Ready-to-use queries for operations and monitoring
Version: 1.0.0 Sources: Altinity Knowledge Base (200+ articles) + ClickHouse Official Docs
More from duyet/claude-plugins
react-nextjs-patterns
React and Next.js implementation patterns for performance and maintainability. Use when building frontend components, pages, and applications with React ecosystem.
133backend-api-patterns
Backend and API implementation patterns for scalability, security, and maintainability. Use when building APIs, services, and backend infrastructure.
40quality-gates
Systematic quality verification procedures for code review and delivery. Use when validating completed work, conducting code reviews, or ensuring production readiness.
36gemini-prompting
Prompt engineering guidance for Gemini (Google) model. Use when crafting prompts for Gemini to leverage system instructions, multimodal capabilities, ultra-long context, and strong reasoning features.
26frontend-design
Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
21engineering-discipline
Core engineering principles for sustainable, maintainable code. No shortcuts, no hacks. Quality gates before completion.
19