Site Reliability Engineering

SRE is the discipline of applying software engineering to operations problems. It replaces ad-hoc ops work with principled systems: reliability targets backed by error budgets, toil replaced by automation, and incidents treated as system failures rather than human ones. This skill covers the full SRE lifecycle - from defining SLOs through capacity planning and progressive delivery - as practiced by teams operating production systems at scale. Designed for engineers moving from "keep the lights on" to systematic reliability ownership.

When to use this skill

Installs

123

Repository

absolutelyskill…yskilled

GitHub Stars

193

First Seen

Mar 15, 2026

Security Audits

Gen Agent Trust HubPass

SocketWarn

SnykWarn