systems-health
Systems Health
Diagnose the development system. Measure stocks, flows, and feedback loops. Find what's sick and prescribe the cheapest fix.
Process
- Collect data
- Measure stocks
- Measure flows
- Assess feedback loops
- Measure complexity signals
- Diagnose and prescribe
- Write .tap/system-health.md
1. Collect Data
Pull from available sources (skip any that aren't accessible):
git log --oneline --since="30 days ago" → commit frequency
git log --oneline --since="90 days ago" → longer trend
git shortlog -sn --no-merges --since="30 days ago" → contributor activity
gh pr list --state all --limit 50 → PR lifecycle
gh pr list --state open → current open PRs
gh run list --limit 20 → CI pass/fail rate
gh issue list --state all --label bug --limit 50 → bug lifecycle
gh issue list --state open → current open issues
Also read if available:
.tap/tap-audit.md→ prior assessment context.tap/system-health.md→ prior health snapshot for trend comparison- Test runner output → test count, coverage
Time windows: Default to 30-day snapshot with 90-day trend. Use --since flag on all git/gh commands.
2. Measure Stocks
Stocks are things that accumulate. Measure current level + trend.
| Stock | How to measure | Healthy signal |
|---|---|---|
| Backlog | gh issue list --state open count |
Stable or shrinking |
| Open PRs | gh pr list --state open count + age |
< 5 open, oldest < 3 days |
| Open bugs | gh issue list --label bug --state open count |
Stable or shrinking |
| Test count | Test runner --list or dry-run |
Growing with codebase |
| Deploy count | gh run list with deploy workflow, or git tags |
Weekly+ |
Trend indicators: Compare current 30-day window to previous 30-day window.
- ▲ growing (stock increasing)
- ▼ shrinking (stock decreasing)
- ─ stable (within 10% variance)
3. Measure Flows
Flows change stocks. Measure rate + balance.
| Flow | How to measure | What it tells you |
|---|---|---|
| Stories in | Issues created per week | Demand on the system |
| Stories out | PRs merged per week | Throughput |
| Cycle time | PR open → merge duration (median) | How fast work moves |
| Review time | PR open → first review (median) | Bottleneck indicator |
| Bug inflow | Bug issues created per week | Quality signal |
| Bug outflow | Bug issues closed per week | Fix rate |
| Deploy frequency | Deploys per week/month | Delivery cadence |
Balance check for each stock:
- Inflow > outflow → stock accumulates → system backing up
- Inflow < outflow → stock drains → system clearing
- Inflow ≈ outflow → stable
4. Assess Feedback Loops
Identify which loops are working and which are broken.
Balancing loops (self-correcting):
| Loop | Working | Broken |
|---|---|---|
| CI gate | CI fails → dev fixes → CI passes | CI fails → ignored, merged anyway |
| Code review | Review catches issues → dev fixes → quality maintained | Reviews rubber-stamped or stuck for days |
| Bug triage | Bug found → prioritized → fixed | Bugs accumulate, never triaged |
| Test failures | Test fails → investigate → fix code or test | Tests disabled, skipped, or ignored |
Reinforcing loops (amplifying):
| Loop | Working | Broken |
|---|---|---|
| Test coverage | Good tests → catch bugs → write more tests | No tests → bugs escape → "tests don't help" |
| Documentation | Good docs → agents work well → docs updated | Thin docs → agent rework → "docs don't help" |
| Small batches | Small PRs → fast review → more small PRs | Big PRs → slow review → bigger PRs |
Evidence-based assessment: Don't guess. Check CI pass rate (from gh run list), review turnaround (from PR data), bug trends (from issue data).
5. Measure Complexity Signals
Track whether the codebase is getting harder to work with over time. Based on Ousterhout's three symptoms of complexity — all derived from git data.
| Signal | What to measure | How | Concern threshold |
|---|---|---|---|
| Change amplification | Median files per commit (30d vs 90d) | git log --shortstat |
Trending up |
| Shotgun surgery | % of commits touching 5+ files across 3+ directories | git log --shortstat + git show --name-only |
> 20% of commits |
| Cognitive load | Top 5 most-changed files in 30d — flag any that are also among the largest | git log --name-only + file sizes |
Large files with high churn |
| Unknown unknowns | % of merged PRs where no test file was changed | gh pr list --state merged + diff per PR |
Trending up |
Present as a compact table with one interpretation line summarizing the overall complexity trend (accumulating / stable / improving).
If complexity is accumulating, feed into Diagnosis with specific interventions: split a god file, add tests to a hot path, extract a module.
6. Diagnose and Prescribe
For each problem found, follow the pattern:
Diagnosis: [what's sick]
Evidence: [data that proves it]
Impact: [how it slows delivery or hurts quality]
Rx: [cheapest intervention]
Prioritize by: most impact for least effort.
Common diagnoses:
- Stocks accumulating → find the bottleneck flow
- Slow cycle time → usually review time or CI time
- Broken feedback loop → identify who/what stopped responding to the signal
- No feedback loop → suggest creating one (tests, CI gate, review process)
7. Write Output
Write to .tap/system-health.md using the template in references/system-health-template.md.
If prior .tap/system-health.md exists, compare trends. Call out what improved and what worsened since last measurement.
Human mode: Walk through findings. Start with the headline ("your system is healthy / has 2 problems / is backing up"). Show the data. Explain the diagnosis. Propose the cheapest fix. Ask: "Want to dig into any of these?"
Agent mode: Write .tap/system-health.md silently. If run as part of a /retrospective, feed findings into the retro.
Boundaries
- Read-only — does NOT modify code, config, or process
- Does NOT assess code quality (that's CLAUDE.md / code review)
- Does NOT assess agent readiness (that's /tap-audit)
- Does NOT capture learnings (that's /retrospective)
- ONLY measures the system and diagnoses problems
- Data-driven — every claim backed by evidence from git/GitHub/CI
More from teambrilliant/tap-skills
retrospective
Just-in-time retrospective focused on improving agent autonomy. Use when someone says "retro", "retrospective", "what did we learn", "what went wrong", "post-mortem", "incident review", or after a feature ships, an incident resolves, a pattern of agent failures emerges, or any event worth reflecting on. Analyzes what happened, identifies what blocked agent autonomy, and produces concrete improvements (learnings + tickets). Not calendar-driven — event-driven. The learning loop that makes the system self-improving.
2blast-radius
Analyze the impact surface of a PR or set of changes before merging. Use when someone says "blast radius", "review this PR", "what does this change affect", "is this safe to merge", "impact analysis", or after an agent opens a PR that needs human verification. Maps what changed, what else is affected, what could break, assigns risk level, and generates a manual verification checklist. The human gate for mixed human-agent teams.
2tap-audit
Assess how ready a repository is for autonomous agent work. Use when someone says "audit this repo", "tap audit", "how ready is this codebase", "assess this project", or when an agent enters an unfamiliar codebase and needs to understand it before working. Scans documentation, MCP servers, CLI tools, permissions, test infrastructure, environments, and process to produce a readiness assessment with actionable leverage points. Outputs to .tap/tap-audit.md.
2