runbook
SKILL.md
Runbook
Generate operational runbooks for services, procedures, or incident response. Investigates the codebase and infrastructure to produce accurate, actionable procedures.
When to Use
- Creating operational documentation for a service
- Documenting deployment, scaling, or maintenance procedures
- Building incident response playbooks
- Standardizing operational procedures across teams
Input
- Topic: Service name, operation type, or incident scenario
- Scope: deployment, scaling, failover, maintenance, troubleshooting
- Optional: Specific scenarios to cover
Investigation Strategy
Launch parallel investigation tracks to gather comprehensive information:
Track 1: Codebase Exploration
- Identify service entry points and configuration
- Find health check endpoints
- Map dependencies (databases, caches, external services)
- Locate logging and metrics instrumentation
- Find existing scripts or automation
Track 2: Infrastructure Analysis
- Review deployment manifests (Kubernetes, Terraform, etc.)
- Identify scaling configuration
- Map service dependencies
- Find monitoring and alerting setup
- Review backup and recovery procedures
Track 3: External Research
- Find operational best practices for the service type
- Research common failure modes
- Identify industry-standard procedures
Output
Generate the runbook document using the template at references/templates/runbook.md.
The runbook should include:
- Service overview and architecture
- Dependencies with failure impact
- Step-by-step procedures with actual commands
- Troubleshooting guides for common issues
- Escalation paths and contacts
Behavior
- Parse topic to identify service and operation scope
- Launch parallel investigation tracks
- Extract configuration, endpoints, and dependencies from codebase
- Identify common operations and failure modes
- Generate step-by-step procedures with actual commands
- Document troubleshooting steps and escalation paths
Constraints
- Accuracy: All commands must be verified against actual codebase/infrastructure
- Actionable: Every procedure must have concrete, executable steps
- Complete: Include prerequisites, verification, and rollback for each procedure
- Maintainable: Note dependencies that may change and require updates
Example
Input: "Generate runbook for the payment-service"
Investigation:
- Found deployment at k8s/payment-service/
- Found health endpoints: /health, /ready
- Dependencies: PostgreSQL (critical), Redis (cache), Stripe API
- Scaling: HPA configured, min 3, max 10 replicas
- Alerts: Prometheus rules in monitoring/
Generated Runbook: payment-service-runbook.md
## Overview
- Service: payment-service
- Owner: payments-team
- Criticality: P1
## Dependencies
| Dependency | Type | Criticality | Failure Impact |
|------------|------|-------------|----------------|
| PostgreSQL | Database | Critical | Full outage |
| Redis | Cache | High | Degraded latency |
| Stripe API | External | Critical | Payment failures |
## Procedures
### Deployment
1. Verify no active transactions
```bash
kubectl exec -it payment-service-0 -- curl localhost:8080/metrics | grep active_transactions
- Apply new deployment
kubectl apply -f k8s/payment-service/deployment.yaml - Monitor rollout
kubectl rollout status deployment/payment-service
Scaling
kubectl scale deployment payment-service --replicas=5
Troubleshooting
High Latency
Symptoms: p99 latency > 500ms Diagnosis:
kubectl top pods -l app=payment-service
kubectl logs -l app=payment-service --tail=100 | grep -i slow
Resolution: Check Redis connection, scale if CPU > 80%
Begin by identifying the service or operation to document and launching investigation tracks.
Weekly Installs
2
Repository
thoreinstein/agentsGitHub Stars
3
First Seen
14 days ago
Security Audits
Installed on
opencode2
gemini-cli2
codebuddy2
github-copilot2
codex2
kimi-cli2