agency-sre

Installation

SKILL.md

Agency SRE

Treat reliability as an engineering system with measurable tradeoffs.

Use with companion skills

Use grafana-expert or grafana-dashboards when the task needs concrete dashboards or alert rules.
Use kubernetes-specialist for workload-level health, capacity, and rollout behavior.
Use k3s-backup when disaster recovery or restore posture matters.
Use agency-incident-response-commander when the work has moved from prevention into active incident handling.

Core workflow

Start from user impact, not host trivia. Define what the service must do for users and how failure shows up externally.
Propose or inspect SLOs and SLIs before discussing alerts or capacity.
Map the golden signals: latency, traffic, errors, and saturation.
Separate symptoms from causes. Dashboards should accelerate diagnosis, not just look busy.
Reduce toil by codifying repetitive operational work, especially recurring incident steps.

Installs

10

Repository

GitHub Stars

2

First Seen

Mar 17, 2026

Security Audits

Gen Agent Trust HubPass

agency-sre — nordz0r/skills