altinity-expert-clickhouse-kafka
SKILL.md
Diagnostics
Run all queries from the file checks.sql and analyze the results.
Interpreting Results
Consumer Health
Check if consumers are stuck by comparing exception time vs activity times:
last_exception_time >= last_poll_timeORlast_exception_time >= last_commit_time→ consumer stuck on error, not progressing- Otherwise → consumer healthy
The exceptions column is a tuple of arrays with matching indices — exceptions.time[-1] and exceptions.text[-1] give the most recent error.
Thread Pool Capacity
kafka_consumers > mb_pool_size→ thread starvation — consumers waiting for available threads- Fix: increase
background_message_broker_schedule_pool_size(default: 16) - Sizing: total Kafka + RabbitMQ/NATS consumers + 25% buffer
Slow Materialized Views (Poll Interval Risk)
- MV avg duration > 30s → consumer may exceed
max.poll.interval.msand get kicked from the group - MV executions with error status → likely consumer rebalances (consumer kicked, MV interrupted mid-batch)
- Most common root cause for slow MVs: multiple
JSONExtractcalls re-parsing the same JSON blob - Fix: rewrite to one-pass
JSONExtract(json, 'Tuple(...)') AS parsed+tupleElement()— see troubleshooting.md
Pool Utilization Trends (12h)
- Sustained high values near pool size → capacity pressure
- Spikes correlating with lag → temporary overload
- Flat zero → Kafka consumers may not be active
Advanced Diagnostics
For deeper investigation, run queries from advanced_checks.sql:
- Consumer exception drill-down — filter to a specific problematic Kafka table
- Consumption speed measurement — snapshot-based rate calculation
- Topic lag via rdkafka_stat — total lag per table and per-partition breakdown
- Broker connection health — connection state, errors, disconnects
Important: rdkafka_stat is not enabled by default in ClickHouse. It requires <statistics_interval_ms> in the Kafka engine settings. See advanced_checks.sql for setup instructions.
Common Issues
For troubleshooting common errors and configuration guidance, see troubleshooting.md:
- Topic authorization / ACL errors
- Poll interval exceeded (slow MV / JSON parsing optimization)
- Thread pool starvation
- Parsing errors / dead letter queue
- Data loss with multiple materialized views
- Offset rewind / replay
- Parallel consumption tuning
Cross-Module Triggers
| Finding | Load Module | Reason |
|---|---|---|
| Slow MV inserts | altinity-expert-clickhouse-ingestion |
Insert pipeline analysis |
| High merge memory | altinity-expert-clickhouse-merges |
Merge patterns |
| Query-level issues | altinity-expert-clickhouse-reporting |
Query optimization |
| Schema concerns | altinity-expert-clickhouse-schema |
Table design |
Settings Reference
| Setting | Scope | Notes |
|---|---|---|
background_message_broker_schedule_pool_size |
Server | Thread pool for Kafka/RabbitMQ/NATS consumers (default: 16) |
kafka_num_consumers |
Table | Parallel consumers per table (limited by cores) |
kafka_thread_per_consumer |
Table | Required for parallel inserts (= 1) |
kafka_handle_error_mode |
Table | stream (21.6+) or dead_letter (25.8+) |
max_poll_interval_ms |
librdkafka | Max time between polls before consumer is kicked (default: 300s) |
statistics_interval_ms |
librdkafka | Enable rdkafka_stat collection (disabled by default) |
Weekly Installs
23
Repository
altinity/skillsGitHub Stars
5
First Seen
Feb 9, 2026
Security Audits
Installed on
codex22
claude-code19
opencode12
github-copilot12
kimi-cli12
gemini-cli12