alerting-strategy
Alerting Strategy
Effective alerts that wake up on-call engineers only for real problems.
Context
You are designing alerts. Alert on SLOs, not absolute thresholds; provide runbooks.
Domain Context
- SLO: Service level objective; what level of service do you promise?
- Error Budget: How much downtime/errors before violating SLO?
- Alert Fatigue: Too many alerts → on-call ignores them
- Runbook: Step-by-step guide to respond to alert
- Escalation: If on-call doesn't respond, escalate
Instructions
- Define SLOs: What do you commit to? 99.9% uptime?
- Measure Error Budget: How much can you fail before SLO breach?
- Alert on SLO: Alert when error budget is at risk
- Runbooks: Document response for each alert
- Tuning: Adjust thresholds to reduce false positives
- Escalation: If on-call unresponsive, page manager
- Blameless Postmortems: Learn from incidents, improve systems
Anti-Patterns
- Alert on everything; alert fatigue kills response
- Alerts with no runbook; on-call is helpless
- SLO not communicated; customers expect unlimited uptime
- Static thresholds; day/night, weekday/weekend are different
- No escalation; on-call doesn't see alert
Further Reading
- Google SRE book on monitoring and alerting
- SLO guidance (Google Cloud)
- PagerDuty on-call best practices
More from sethdford/claude-skills
api-test-automation
Expert approach to api-test-automation in test automation. Use when working with .
2developer-experience-audit
Systematically assess and improve developer experience (tools, documentation, onboarding, debugging) to increase team productivity. Use in roadmapping or when noticing developer friction.
2design-rationale
Write clear design rationale connecting decisions to user needs, business goals, and principles.
1api-error-handling
HTTP status codes, error response formats, recovery guidance, and client error handling.
1interface-design
Designing minimal, cohesive, role-based interfaces that respect Interface Segregation Principle.
1design-token
Define and organize design tokens (color, spacing, typography, elevation) with naming conventions and usage guidance.
1