create-runbook

SKILL.md

Create Operational Runbook

Generate a runbook for operational procedures, incident response, or troubleshooting.

Process

1. Parse Arguments

Extract from user input:

  • Topic: Procedure name (required) - e.g., "database failover", "deploy to production"
  • Type: incident, operational, troubleshooting, emergency (default: operational)
  • Service: Specific service/system (optional)

2. Analyze Context

Based on the topic and type, gather relevant information:

For incident runbooks:

  • What alerts trigger this runbook?
  • What are common causes?
  • What metrics indicate the issue?
  • What are the resolution steps?

For operational runbooks:

  • What is the procedure?
  • What are the prerequisites?
  • What are the verification steps?
  • What could go wrong?

For troubleshooting runbooks:

  • What symptoms indicate the issue?
  • What diagnostic steps are needed?
  • What are common fixes?
  • When to escalate?

For emergency runbooks:

  • What constitutes the emergency?
  • What immediate actions are required?
  • Who needs to be notified?
  • What is the recovery process?

3. Load Skill and Generate

  1. Load the runbook-creation skill for templates
  2. Select appropriate template based on type argument
  3. Generate runbook with:
    • Comprehensive metadata
    • Step-by-step procedures
    • Decision trees where applicable
    • Troubleshooting sections
    • Escalation paths

4. Create File

Determine file location:

Priority order:
1. docs/runbooks/{type}/RB-{number}-{slug}.md
2. docs/operations/runbooks/RB-{number}-{slug}.md
3. runbooks/RB-{number}-{slug}.md

Numbering:

  • Find existing runbooks, increment number
  • Or use date-based: RB-2025-01-{sequence}

5. Populate Content

Generate content sections based on type:

All types include:

  • Metadata table (ID, category, service, owner, dates)
  • Overview (purpose, when to use, expected outcome)
  • Prerequisites
  • Main procedure with numbered steps
  • Troubleshooting section
  • Escalation path

Type-specific sections:

Type Additional Sections
incident Alert details, impact assessment, communication templates
operational Rollback procedure, verification checklist
troubleshooting Symptom/cause matrix, diagnostic commands
emergency Immediate actions, notification list, recovery steps

Output Content

Incident Runbook Structure

# Incident Runbook: {TOPIC}

| Property | Value |
|----------|-------|
| **ID** | RB-INC-{NUMBER} |
| **Alert** | [Alert name] |
| **Severity** | [SEV1/2/3/4] |
| **Service** | {SERVICE} |
| **Owner** | [Team] |
| **Last Updated** | {DATE} |

## Alert Details
[Alert trigger conditions]

## Immediate Actions (First 5 Minutes)
1. Acknowledge alert
2. Assess impact
3. Initial communication

## Diagnosis
[Decision tree and diagnostic steps]

## Resolution
[Step-by-step fix procedures]

## Verification
[How to confirm resolution]

## Communication
[Status update templates]

## Post-Incident
[Cleanup and follow-up tasks]

Operational Runbook Structure

# Runbook: {TOPIC}

| Property | Value |
|----------|-------|
| **ID** | RB-OPS-{NUMBER} |
| **Category** | Operational |
| **Service** | {SERVICE} |
| **Owner** | [Team] |
| **Last Updated** | {DATE} |
| **Estimated Duration** | [Time] |

## Overview
[Purpose and when to use]

## Prerequisites
[Access, tools, knowledge needed]

## Procedure

### Step 1: [Name]
[Detailed instructions with commands]

### Step 2: [Name]
[Detailed instructions]

## Verification
[How to confirm success]

## Rollback
[How to undo if needed]

## Troubleshooting
[Common issues and fixes]

Example Invocations

/create-runbook "database failover"
→ Creates operational runbook for database failover procedure

/create-runbook "high error rate" type=incident service="api-gateway"
→ Creates incident runbook for API gateway error rate alerts

/create-runbook "pod crash loop" type=troubleshooting service="order-service"
→ Creates troubleshooting guide for order service pod crashes

/create-runbook "security breach response" type=emergency
→ Creates emergency runbook for security incidents

Content Generation Guidelines

When generating runbook content:

Commands

  • Include actual, tested commands
  • Use environment variables for sensitive data
  • Add expected output examples

Decision Points

  • Use clear flowchart notation
  • Cover all branches (success and failure)
  • Include "when in doubt" guidance

Timing

  • Estimate time for each step
  • Note SLA implications
  • Include "if taking too long" escalation

Communication

  • Provide copy-paste templates
  • Include notification channels
  • Specify stakeholder expectations

Post-Creation Guidance

After creating the runbook:

  1. Fill in specifics - Replace placeholders with actual commands/URLs
  2. Validate commands - Test all commands in non-production
  3. Review with SME - Have subject matter expert verify
  4. Test execution - Do a dry run of the procedure
  5. Train team - Ensure operators know it exists
  6. Schedule review - Set calendar reminder for quarterly review

Quality Criteria

Generated runbook must:

  • Have unique identifier
  • Include all required metadata
  • Provide actionable step-by-step instructions
  • Include verification steps after each major action
  • Cover failure scenarios and rollback
  • Define escalation path and contacts
  • Be testable in non-production environment
Weekly Installs
1
GitHub Stars
38
First Seen
10 days ago
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1