tap-audit
TAP Audit
Assess how autonomous an agent can be in this repo right now. Produce a structured assessment at .tap/tap-audit.md.
This skill does NOT describe coding conventions — that's CLAUDE.md's job. This skill assesses the system: what's configured, what's missing, what's slowing delivery or letting bugs through.
Process
- Check existing audit
- Scan the repo
- Assess each dimension
- Score readiness
- Identify leverage points
- Write
.tap/tap-audit.md - Seed
.tap/architecture.md - Present findings (human) or proceed to task (agent)
0. Check Existing Audit
If .tap/tap-audit.md exists, read it before doing anything else.
- Parse
Last run:date from the file. - Run
git log --oneline --since="[date]"to count commits since last audit. - Check if key config files changed since that date:
git diff --name-only HEAD@{[date]} -- .claude/ .mcp.json package.json .github/workflows/ CLAUDE.md
Decision tree:
| Condition | Action |
|---|---|
--force flag |
Full re-run (skip to step 1) |
| <30 days old, no key config changes | Summary mode: print score + top leverage points + audit age. Say "Audit is current ([N] days). Use --force to re-scan." STOP. |
| <30 days old, key config changed | Delta mode: re-assess only affected dimensions. Update .tap/tap-audit.md in-place with new Last run: date. Skip unchanged sections. |
| >=30 days old, significant repo activity (many commits, new contributors) | Recommend full re-run. Ask before proceeding. |
| >=30 days old, low activity (few commits, same contributors) | Summary mode: the audit is likely still accurate. Print score + leverage points. Say "Audit is [N] days old but repo activity is low. Use --force to re-scan." STOP. |
| Date missing or unparseable | Full re-run (skip to step 1) |
The audit file IS the cache. Don't re-scan what hasn't changed.
1. Scan the Repo
Read these files/locations (skip any that don't exist):
.claude/settings.json → permissions
.claude/settings.local.json → local overrides
.mcp.json → MCP servers configured
CLAUDE.md → coding instructions quality
AGENTS.md → agent-specific boundaries
package.json / Cargo.toml / go.mod / requirements.txt → stack + scripts
tsconfig.json / biome.json / .eslintrc → tooling config
.github/workflows/ → CI/CD setup
.tap/ → existing project memory
vercel.json / fly.toml / Dockerfile / render.yaml → deploy config
Run (if tools available):
git log --oneline -20→ recent activitygit shortlog -sn --no-merges --since="90 days ago"→ contributorsgh run list --limit 5→ recent CI runs- Test runner dry-run to discover test count
2. Assess Each Dimension
Environments
Discover all available environments with URLs. Check package.json scripts, deploy configs, CI workflows, CLAUDE.md, README.
- Local: [command] → [url]
- Preview: [url pattern or "not configured"]
- Staging: [url or "not configured"]
- Production: [url or "not configured"]
Agent Harness Readiness
Assess six areas. Mark ✓ (available) or ✗ (missing/incomplete) for each item.
Documentation
- CLAUDE.md: exists? covers stack, conventions, run/test/deploy commands?
- AGENTS.md: exists? defines scope boundaries, escalation rules?
- ADRs: any architectural decisions documented?
MCP Servers (from .mcp.json)
- List each configured server and what it enables
- Flag missing ones based on stack (e.g., using Postgres but no DB MCP; web app but no chrome-devtools)
Skills
- What skills are available?
- What's missing for this stack? (e.g., Neon skill for Neon Postgres, Temporal skill for Temporal)
CLI Tools
- Verify: package manager, test runner, linter, build tool, deploy tool, DB CLI, infra CLI
- Flag tools the stack requires but agent can't access
Permissions (from .claude/settings.json + settings.local.json)
- What's explicitly allowed and denied?
- What's missing that blocks autonomous work?
Test Infrastructure
- Test count and coverage if discoverable
- Types present: unit, integration, acceptance, e2e, browser
- Can the agent verify its own work?
Readiness Score
- FULL: Agent can implement, test (unit + browser), access DB, verify end-to-end. CLAUDE.md comprehensive. All necessary MCP servers and CLIs configured.
- PARTIAL: Agent can implement and run some tests. Missing some integrations. CLAUDE.md exists but has gaps.
- MINIMAL: Agent can read/write code but can't run tests, no MCP servers, thin or missing CLAUDE.md.
Design Complexity
Quick spot check on how hard this codebase is to modify. Sample the 5-10 most-changed files recently (git log --name-only --since="30 days ago") — these are what agents will touch most.
For each sampled file, check:
- File size — proxy for module depth. Large files doing too much = hard to understand, easy to break
- Import fanout — proxy for coupling. Many imports = many dependencies = change amplification risk
- Layer structure — pass-through wrappers, thin abstractions that add interface cost without hiding complexity
- Consistency — do similar things follow similar patterns, or does each file invent its own approach
Rate overall: Easy / Moderate / Hard to modify.
- Easy: small focused modules, low coupling, consistent patterns
- Moderate: some large files or high coupling, but patterns are clear
- Hard: god files, high coupling, inconsistent patterns, pass-through layers
Specific design smells found flow into Approach Gaps as actionable items.
Feedback Loops
Discover the top 3 workflows in this repo — both automated and manual. Don't just check infrastructure (tests, CI, docs). The most valuable finding is a workflow humans do by hand that an agent could own.
Active discovery — run these scans:
-
Binary assets without generators — scan for committed images, fonts, audio, video, PDFs. Check if corresponding generation scripts, Makefiles, or asset pipelines exist. If PNGs exist but no script produces them → manual workflow.
Find: *.png, *.jpg, *.svg, *.gif, *.mp3, *.wav, *.pdf, *.ttf, *.otf Then: look for Makefile, generate-*.sh, scripts/, asset pipeline, or build step that produces them Missing generator = manual creation workflow -
Git history patterns — files that get re-committed with small changes repeatedly suggest a manual iteration loop (human tweaks, checks, tweaks again). Look for binary files or config files with 5+ commits.
git log --all --diff-filter=M --name-only --pretty=format: | sort | uniq -c | sort -rn | head -20Files with high re-commit counts without associated test/script changes = manual iteration.
-
Human-in-the-loop scripts — scan shell scripts and docs for steps that require visual inspection, manual input, or judgment. Look for:
- Scripts that open a window/browser and wait for human to look
- README/CLAUDE.md steps phrased as "then you...", "manually...", "visually check...", "inspect the output"
- Scripts with
read,open,sleep(waiting for human), or comments like "# check this looks right"
-
Workflow descriptions in docs — read CLAUDE.md, README, and any contributing guides. Any multi-step process described in prose is a candidate for automation. Pay special attention to sequences like "first run X, then check Y, then run Z" — that's an unautomated pipeline.
For each workflow, assess:
| Element | What to look for |
|---|---|
| Generator | Can an agent produce the output? If not, what's missing — a skill, an MCP, a CLI tool, an API? |
| Evaluator | Can something other than the generator verify the output? (tests, lint, visual regression, Playwright, type checker) |
| Handoff | Can the agent context-reset and resume? (shaped docs, plans, .tap/ memory, clear commit history) |
| Grading criteria | Are quality expectations measurable, not vibes? (test suites, lint rules, acceptance criteria, design specs) |
Rate each workflow:
- Closed loop — all four elements present. Agent can iterate autonomously.
- Open loop — evaluator or grading criteria missing. Agent produces output but can't verify quality — human must inspect.
- No loop — no evaluator, no criteria. Agent guesses and hopes.
- Manual — human does this entirely by hand. No agent involvement yet.
For each non-closed workflow, prescribe a concrete automation path: a specific skill to create, MCP to wire up, hook to add, CLI tool to integrate, or external service to connect. Be specific — "add browser tests" is too vague; "create a sprite-generation skill that uses nano-banana-pro MCP to generate pixel art PNGs, renders them in-app via dev-check.sh, and validates dimensions/palette" is actionable.
Approach Gaps
Don't repeat CLAUDE.md. Flag what's MISSING that causes agent rework:
- Test coverage gaps (which areas have no tests?)
- Missing ADRs (where do agents guess at architectural intent?)
- Undocumented patterns (inconsistencies agents will copy?)
- Design smells from the complexity spot check (god files to split, pass-through layers to collapse, inconsistent patterns to standardize)
Process
- Branching strategy
- CI/CD pipeline (what runs, recent pass rate)
- Deploy mechanics (auto or manual, to which environments)
3. Identify Leverage Points
Goal: ship faster while maintaining quality bar.
Find 3-5 leverage points. Each answers: what's slowing delivery OR letting defects through?
### N. [Short description] → [consequence]
- Symptom: [observable problem]
- Why it costs: [concrete impact on speed or quality]
- Fix: [cheapest intervention + estimated effort]
Prioritize by: cheapest fix that unblocks the most agent autonomy.
4. Write .tap/tap-audit.md
Create .tap/ directory if it doesn't exist. Write the assessment using the template in references/tap-audit-template.md.
5. Seed .tap/architecture.md
Always do this step. If .tap/architecture.md doesn't already exist, create it now.
Scan the codebase for deliberate architectural decisions — they're visible as:
- Consistent patterns across the codebase (same error handling everywhere)
- Config that implies decisions (Temporal config, ORM choice, auth provider)
- Package choices that constrain patterns (Result library, specific framework)
- Comments or docs explaining "why" something is done a certain way
Write each decision in compressed format: 2-4 lines max. Capture the principle behind the decision so agents can apply it to novel situations. See references/architecture-format.md for format and examples.
Do NOT create individual ADR files. Everything goes in one .tap/architecture.md — one file, ~50 lines, optimized for agent consumption.
If .tap/architecture.md already exists, review it against what you discovered and note any missing decisions in the Approach Gaps section of tap-audit.md.
6. Present Findings
Human mode (default):
Always open with the signature block:
`★ Audit View ────────────────────────────────────`
[repo name] — [readiness score]
├─ [top feedback loop finding]
├─ [#1 leverage point]
└─ [cheapest fix to start with]
`─────────────────────────────────────────────────`
Then:
- Summarize readiness score and what it means
- Highlight top 2-3 leverage points
- Propose the single cheapest fix to start with
- Ask if they want to address any leverage points now
Agent mode (invoked with --agent or in automated pipeline):
- Write .tap/tap-audit.md and .tap/architecture.md silently
- Log readiness score
- Proceed to assigned task
Boundaries
- Does NOT describe the tech stack (CLAUDE.md's job)
- Does NOT set coding conventions (CLAUDE.md's job)
- Does NOT measure team dynamics like review turnaround (/systems-health's job)
- Does NOT modify any code or config — read-only assessment
More from teambrilliant/tap-skills
retrospective
Just-in-time retrospective focused on improving agent autonomy. Use when someone says "retro", "retrospective", "what did we learn", "what went wrong", "post-mortem", "incident review", or after a feature ships, an incident resolves, a pattern of agent failures emerges, or any event worth reflecting on. Analyzes what happened, identifies what blocked agent autonomy, and produces concrete improvements (learnings + tickets). Not calendar-driven — event-driven. The learning loop that makes the system self-improving.
2systems-health
Measure the health of a software development system using stocks, flows, and feedback loops. Use when someone says "systems health", "how's the project going", "health check", "measure our process", "are we shipping fast enough", "what's slowing us down", or for periodic check-ins on development velocity and quality. Pulls data from git, GitHub, and CI to diagnose what's working and what's broken. Outputs to .tap/system-health.md.
2blast-radius
Analyze the impact surface of a PR or set of changes before merging. Use when someone says "blast radius", "review this PR", "what does this change affect", "is this safe to merge", "impact analysis", or after an agent opens a PR that needs human verification. Maps what changed, what else is affected, what could break, assigns risk level, and generates a manual verification checklist. The human gate for mixed human-agent teams.
2