root-cause-analysis
Root Cause Analysis with Kopai
Guide for debugging production issues using telemetry data (traces, logs, metrics) via Kopai CLI.
Prerequisites
Ensure access to Kopai app backend. Make sure the services are set up to send their OpenTelemetry data to Kopai. See otel-instrumentation skill for setup.
RCA Workflow
- Find error traces —
npx @kopai/cli traces search --status-code ERROR --limit 20 --json. If empty: broaden time range, check service name, or search logs with--severity-min 17 - Get full trace context —
npx @kopai/cli traces get <traceId> --json. Check Duration, StatusCode, and span hierarchy for bottlenecks - Correlate logs —
npx @kopai/cli logs search --trace-id <traceId> --json. Look for error messages, stack traces, and timestamps - Check metrics —
npx @kopai/cli metrics discover --jsonthennpx @kopai/cli metrics search --type <type> --name <name> --jsonfor anomalies - Present findings — summarize root cause with evidence (specific traceIds, log entries, metric anomalies), impact, and suggested fix
Quick Example
# Find failing requests
npx @kopai/cli traces search --status-code ERROR --service payment-api --json
# Get trace details (copy traceId from above)
npx @kopai/cli traces get abc123def456 --json
# Check correlated logs
npx @kopai/cli logs search --trace-id abc123def456 --severity-min 17 --json
Rules
1. Workflow (CRITICAL)
workflow-find-errors- Find Error Tracesworkflow-get-context- Get Full Trace Contextworkflow-correlate-logs- Correlate Logs with Traceworkflow-check-metrics- Check Related Metricsworkflow-identify-cause- Identify Root Cause & Present Findings
2. Patterns (HIGH)
pattern-http-errors- HTTP Error Debuggingpattern-slow-requests- Slow Request Analysispattern-distributed- Distributed Failure Tracingpattern-log-driven- Log-Driven Investigation
Read rules/<rule-name>.md for details.
Tips
- Always use
--jsonfor programmatic analysis - Pipe to
jqfor filtering/aggregation - Start with errors, then trace backwards
- Check span Duration to find bottlenecks
- Correlate TraceId across traces, logs, metrics
- Use
--severity-min 17instead of--severity-text ERRORto catch all error-level logs regardless of text casing. Fall back to--body "error"for errors logged at INFO or with no severity.
References
- trace-filters - Trace search filter options
- log-filters - Log search filter options
- metric-filters - Metric search filter options
More from kopai-app/kopai-mono
otel-instrumentation
Instrument applications with OpenTelemetry SDK and validate telemetry using Kopai. Use when setting up observability, adding tracing/logging/metrics, testing instrumentation, debugging missing telemetry data, or when traces/logs/metrics aren't appearing after setup. Also use when users say things like "my traces aren't showing up", "I don't see any data", or "how do I add observability to my app".
25create-dashboard
Create observability dashboards from OTEL metrics, logs, and traces using Kopai. Use when building metric visualizations, monitoring views, KPI panels, or when the user wants to see their telemetry data in a dashboard — even if they don't say "dashboard" explicitly. Also use when other skills or workflows need to present telemetry data visually (e.g. after root cause analysis).
4