cx-incident-management
Incident Management Skill
Use this skill as the gateway for incident triage, SLO monitoring, and notification verification. It orchestrates the full triage workflow - from detection through resolution - and cross-references cx-alerts for deep alert management and cx-telemetry-querying for root cause investigation.
CLI Commands
| Command | Subcommands | Purpose |
|---|---|---|
cx incidents |
list, get, acknowledge, resolve, close, assign, unassign, events, aggregations |
Manage and triage incidents |
cx slos |
list, get, create, update, delete |
Monitor and manage SLO definitions |
cx alerts |
list, get |
Check which alerts are firing (see cx-alerts skill for full alert management) |
cx notifications connectors |
list, get |
Verify notification connector configuration |
cx notifications routers |
list, get |
Verify notification routing rules |
cx notifications presets |
list, get |
Check notification preset templates |
cx notifications test |
connector, destination, preset, routing-condition, template-render |
Test notification delivery |
Key flags:
cx incidents listsupports--status(TRIGGERED, ACKNOWLEDGED, RESOLVED),--severity(CRITICAL, WARNING, INFO),--assignee- All commands support
-o jsonfor structured output and-p <profile>for profile selection cx slos create/updateuse--from-file <path>(or-for stdin)
Incident Triage Workflow
Step 1: Check Active Incidents
cx incidents list -o json
cx incidents list --status TRIGGERED -o json
cx incidents list --severity CRITICAL -o json
Get an overview of what's happening. Filter by severity for immediate priorities:
cx incidents list -o json | jq '[.[] | select(.severity == "CRITICAL") | {id, name, status, severity, started_at}]'
Step 2: Get Incident Details
cx incidents get <incident-id> -o json
cx incidents events --incident-id <incident-id> -o json
Review the incident timeline and related events to understand scope and progression.
Step 3: Check Related Alerts
cx alerts list -o json
Find which alerts are currently firing. For deep alert inspection, switch to the cx-alerts skill.
cx alerts list -o json | jq '[.[] | select(.is_active == true) | {id, name, severity, last_triggered}]'
Step 4: Review SLO Status
cx slos list -o json
cx slos get <slo-id> -o json
Check if SLOs are breaching or error budgets are burned:
cx slos list -o json | jq '[.[] | {name, status, remaining_budget_percentage}]'
Step 5: Verify Notifications
cx notifications connectors list -o json
cx notifications routers list -o json
cx notifications presets list -o json
Confirm the right people were notified through the correct channels.
Step 6: Pivot to Root Cause
Switch to the cx-telemetry-querying skill to investigate the underlying cause using logs, traces, and metrics.
Incident Actions
Acknowledge
cx incidents acknowledge <incident-id>
cx incidents acknowledge <id1> <id2> <id3>
Resolve
cx incidents resolve <incident-id>
cx incidents resolve <id1> <id2> <id3>
Assign
cx incidents assign <incident-id> --user-id <user-id>
Close
cx incidents close <incident-id>
SLO Management
Creating SLOs
Template from an existing SLO:
cx slos get <existing-slo-id> -o json > slo-template.json
# Edit slo-template.json with new service/threshold
cx slos create --from-file slo-template.json
Monitoring SLO Health
# All SLOs with their status
cx slos list -o json | jq '[.[] | {name, status, target_percentage, remaining_budget}]'
# SLOs that are breaching
cx slos list -o json | jq '[.[] | select(.status != "OK")]'
Notification Debugging
When notifications aren't reaching the right people:
1. Check Connectors
cx notifications connectors list -o json | jq '[.[] | {id, name, type}]'
Verify the expected channels (Slack, PagerDuty, email) exist and are configured.
2. Check Routers
cx notifications routers list -o json | jq '[.[] | {id, name, entity_type}]'
Verify routing rules map the right alert types to the right connectors.
3. Test Notification Delivery
cx notifications test connector --from-file test-connector.json
cx notifications test destination --from-file test-destination.json
cx notifications test preset --from-file test-preset.json
cx notifications test routing-condition --from-file test-condition.json
Incident Aggregations
Get a high-level view of incident patterns:
cx incidents aggregations -o json
Use this to understand incident frequency, MTTR trends, and severity distribution.
Key Principles
- Triage before deep-dive - check incidents, alerts, and SLOs before querying telemetry data
- Check SLO burn rate, not just status - a slowly burning SLO needs attention before it breaches
- Verify notification chain end-to-end - connector exists → router maps correctly → test delivery works
- Cross-reference with telemetry - use
cx-telemetry-queryingskill for root cause after triage - Acknowledge promptly - acknowledge incidents to signal ownership and stop re-notifications
- Use incident events for timeline -
cx incidents eventsshows the full incident lifecycle
Related Skills
cx-alerts- deep alert management: creating, updating, and inspecting alert definitionscx-telemetry-querying- root cause investigation using logs, metrics, traces, and RUMcx-observability-setup- configure notification channels and routing for alerts
More from coralogix/cx-cli
cx-telemetry-querying
|
102cx-alerts
This skill should be used when the user asks to "manage alerts", "create alert", "list alerts", "check alert status", "enable alert", "disable alert", "investigate firing alerts", "check which alerts are active", "find alerting rules", "set up an alert", "configure alerting", "mute an alert", "silence an alert", "see alert definitions", "check alert priority", or wants to manage Coralogix alert definitions using the cx CLI.
95cx-observability-setup
>
87cx-create-dashboard
>
83cx-cost-optimization
>
79cx-data-pipeline
>
79