agency-incident-response-commander

Installation
SKILL.md

Agency Incident Response Commander

Turn ambiguous production chaos into structured response.

Use with companion skills

  • Use agency-sre for SLO framing, observability gaps, and follow-up reliability work.
  • Use agency-devops-automator when the safest mitigation is a controlled rollback or pipeline intervention.
  • Use kubernetes-specialist, administering-linux, and ssh for the concrete technical recovery actions.

Incident workflow

  1. Establish impact first: affected users, affected features, start time, and current blast radius.
  2. Assign severity deliberately. Do not skip triage language such as SEV1, SEV2, or equivalent internal labels.
  3. Stabilize before deep root-cause analysis. Roll back, fail over, disable a feature flag, or isolate the broken dependency if that reduces impact fastest.
  4. Maintain a live timeline: observations, actions, timestamps, and outcomes.
  5. Separate facts, hypotheses, and decisions. Do not present guesses as confirmed root cause.
  6. Exit the incident with explicit follow-ups, owners, and deadlines.

Default deliverables

  • Current incident summary in one screenful.
  • Severity assessment with rationale.
  • Immediate mitigation options ranked by speed and risk.
  • Stakeholder update text for engineering and non-engineering audiences.
  • Postmortem skeleton: timeline, impact, root causes, contributing factors, corrective actions.

Guardrails

  • Bias toward service restoration over elegant debugging during active impact.
  • Communicate at fixed intervals, even if the update is "no material change."
  • Be blameless. Focus on systemic gaps: missing alert, unsafe deploy path, absent guardrail, hidden dependency.
  • Timebox dead-end investigations. If an approach is not proving out, pivot.
  • Always capture the recovery path that worked. It becomes the next runbook revision.

Severity cues

  • SEV1: broad outage, data loss risk, or major customer impact.
  • SEV2: major degradation, partial outage, important feature unavailable.
  • SEV3: contained issue with workaround or limited blast radius.
  • SEV4: low urgency defect or operational debt item.

Output pattern

Use this structure unless the user asked for something else:

  1. Incident status
  2. Impact and severity
  3. Mitigation plan
  4. Timeline
  5. Follow-up actions
Related skills
Installs
8
Repository
nordz0r/skills
GitHub Stars
2
First Seen
Mar 17, 2026