Incident Commander

Classify incident severity, reconstruct timelines from heterogeneous event sources, and generate structured post-incident reviews with root cause analysis and action items. Codifies PagerDuty, Google SRE, and Atlassian incident-management practices into severity scoring, escalation matrices, communication templates, RCA frameworks, and SLA/error-budget tracking.

Core Capabilities

Severity classification — multi-dimensional scoring (revenue, user scope, data/security risk, service criticality, blast radius) into SEV-1 to SEV-4 with confidence and escalation paths.
Timeline reconstruction — chronological timelines from logs, alerts, Slack, and deploy events with phase detection and gap analysis.
Post-incident review — PIRs with 5 Whys, Fishbone, Timeline, or Bow Tie RCA plus categorized action items (owner, priority, deadline).
Postmortem quality — coverage-gap detection, action-item quality scoring, MTTD/MTTR benchmarking.
Communication & escalation — severity-specific internal/executive/customer/status-page templates; technical (L1-L4) and business escalation matrices with time-based triggers.
SLA / error-budget tracking — SLI/SLO/SLA hierarchy, error budgets, burn-rate alerting, and breach handling.

When to Use

Handling an active incident — classify severity, establish command, mitigate, communicate.
Running a post-incident review — reconstruct timeline, perform RCA, assign action items.
Managing escalation — apply technical and business escalation paths by severity and elapsed time.
Building or auditing response playbooks, comms templates, or SLA/error-budget policy.

incident-commander

Incident Commander

Core Capabilities

When to Use