incident-communication

Installation

SKILL.md

Incident Communication

Write status page updates that are clear, honest, and useful — for any phase of an incident.

When to Use

"write an incident update"
"we have an API outage, help me communicate it"
"draft a resolved update for the database incident"
"our webhooks are delayed, what should I post?"

Workflow

1. Check for Context

Read .agents/status-page-context.md if it exists. Use it for:

Tone and voice — match the team's communication style
Components — reference the correct component names
Severity levels — calibrate urgency appropriately
Update cadence — respect the team's SLA for update frequency

If the file doesn't exist, suggest running the status-page-context skill first. Proceed without it if the user wants to skip.

2. Determine the Phase

Ask the user which phase they're in if not obvious from their message:

Phase	When	Purpose
Investigating	Something is wrong, cause unknown	Acknowledge the issue, set expectations
Identified	Root cause found, fix in progress	Explain what's happening, share the plan
Monitoring	Fix deployed, watching for stability	Confirm the fix, set recovery expectations
Resolved	Incident is over	Summarize what happened with exact timeframes

3. Gather Incident Details

For any phase, you need:

What's affected — which components/services (use names from context if available)
What's the user impact — what are users experiencing? (errors, slowness, data loss)
What's NOT affected — critical for reducing panic
What's being done — current actions being taken

Additional details by phase:

Investigating: When did it start? Who reported it?
Identified: What's the root cause? What's the fix plan? ETA?
Monitoring: What fix was deployed? How long will monitoring last?
Resolved: Exact start/end times (UTC). What was the root cause? Will there be a postmortem?

4. Write the Update

Follow these principles (in priority order):

Principle 1: Scope the blast radius immediately

The first sentence should tell users what's affected AND what's not.

Do: "REST API requests are returning elevated 5xx errors. The dashboard and webhook delivery are operating normally." Don't: "We are investigating reports of degraded performance for some services."

Principle 2: Be specific about user impact

Describe what users are experiencing, not just what's broken internally.

Do: "Deployments created between 11:20 and 15:14 UTC may be failing. Existing deployments are unaffected." Don't: "We are experiencing an issue with our deployment pipeline."

Principle 3: Give actionable guidance when possible

If users can do something to mitigate, tell them.

Do: "If you're seeing errors, redeploying will resolve the issue for your project." Don't: "We are working on a fix."

Principle 4: Include timestamps in UTC

Every update should reference when things happened.

Do: "Starting at 14:25 UTC, iDEAL transactions began failing." Don't: "We recently noticed some issues."

Principle 5: Set expectations for the next update

Tell users when they'll hear from you again.

Do: "We'll post another update within 30 minutes or sooner if the situation changes." Don't: (silence)

Principle 6: Resolved updates summarize the full story

Include exact time window, what happened, what was done, and whether a postmortem will follow.

Do: "Between 18:00 and 18:23 UTC, the REST API experienced elevated error rates (peak 12% of requests) caused by a misconfigured load balancer rule. The rule was rolled back at 18:19 UTC and error rates returned to normal by 18:23 UTC. We'll publish a full postmortem within 48 hours." Don't: "This incident has been resolved."

5. What to Communicate Next

After writing the update, always tell the user what comes next:

After investigating: "When you identify the cause, run this skill again for an 'identified' update."
After identified: "Once the fix is deployed, run this skill for a 'monitoring' update."
After monitoring: "When you're confident the fix is stable, run this skill for a 'resolved' update."
After resolved: "Consider writing a postmortem — use the postmortem skill."

Phase Templates

Investigating

[Component] is experiencing [user-visible impact] starting at [time UTC].
[What is NOT affected].
We are investigating the cause and will provide an update by [time/timeframe].

Identified

We've identified the cause of [brief description of issue affecting Component].
[Root cause in plain language].
We are [action being taken] and expect [recovery ETA or "will update when we have an ETA"].
[What users can do in the meantime, if anything].
Next update by [time/timeframe].

Monitoring

A fix for [brief issue description] has been deployed at [time UTC].
[What the fix was, in one sentence].
We are monitoring for stability and will resolve this incident if no further issues arise within [timeframe].
[Any user action needed, e.g., "no action needed" or "you may need to retry failed requests"].

Resolved

Between [start time] and [end time] UTC, [Component] experienced [user-visible impact].
[Root cause in 1-2 sentences].
[What was done to fix it].
[Impact summary: % of users/requests affected, if known].
[Postmortem commitment: "We'll publish a detailed postmortem within [timeframe]" or "No further action needed"].

Anti-patterns to Avoid

Anti-pattern	Why it's bad	Instead
"We apologize for any inconvenience"	Empty corporate filler	State impact honestly and what you're doing
"Some users may experience issues"	Vague, unhelpful	Specify what users see and who's affected
"We are continuing to investigate" (repeated)	No new information	Share what you've learned, even if partial
"This incident has been resolved" (with no details)	Users don't know what happened	Summarize timeline, cause, fix, and impact
"Please be patient"	Patronizing	Give an ETA or next update time
Copy-pasting the same update multiple times	Looks lazy, erodes trust	Each update should add new information

Related Skills

status-page-context — Set up the context document this skill reads (tone, components, severity)
postmortem — Write a detailed postmortem after a resolved incident
maintenance — Write planned maintenance announcements (not incidents)
status-report — Write periodic health reports

Related skills

More from openstatushq/skills

Installs

Repository

openstatushq/skills

GitHub Stars

First Seen

Mar 27, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass