harness
Harness
Build the verification infrastructure that makes agent work trustworthy.
Principles
- Environment > instruction — the harness matters more than the prompt
- Mechanical enforcement > prose — hooks, CI, health checks, and scripts beat wishes
- Separate builder from judge —
harnessbuilds the rig,verifyuses it - Smallest useful harness first — add layers in order, stop when the repo becomes reliably verifiable
- Progressive disclosure — keep the core workflow here, load patterns on demand
Handoffs
- Need to review a diff, branch, or PR on real surfaces → use
verify - Need to update AGENTS.md, README.md, specs, or repo docs → use
docs
The 7-Layer Stack
- Boot — single command starts the app
- Smoke — a fast proof the app is alive
- Interact — agent can exercise the real surface
- E2e — key user flows work end to end
- Enforce — hooks, CI gates, lint rules, or mechanical checks
- Observe — logs, health endpoints, traces, machine-readable signals
- Isolate — worktrees or containers do not collide
Workflow
1. Audit
Grade the repo across these dimensions:
- bootable
- testable
- observable
- verifiable
For each, report:
- status:
pass/partial/fail - evidence: file or command
- gap: what is missing
Use references/grading.md. Lowest dimension sets the overall grade.
2. Setup
Build missing layers in this order:
Boot → Smoke → Interact → E2e → Enforce → Observe → Isolate
Each step should be independently useful. Stop once the repo is reliably verifiable; do not build a cathedral because you got excited.
3. Improve
Tighten weak or flaky layers:
- remove mock-only confidence theater
- replace one-off checks with reusable scripts or hooks
- add logs and health signals agents can query
- make parallel work safe when agent collisions are real
4. Hand Off
When the repo reaches C+ and can be judged honestly, hand off to verify.
If harness changes created doc drift, hand off to docs.
Anti-Patterns
- Mock-only tests — pass by construction, verify nothing
- Self-evaluation — builder grading its own work
- Docs-only fixes disguised as harness work
- Routine PR review here — that's
verify - Perfect harness upfront — iterate from real failure modes
Output
After harness work, report:
- grade before and after
- dimensions with evidence
- files changed
- remaining gaps ranked by impact
- verify readiness
- recommended next handoff:
verify,docs, or human review
References
- references/grading.md — harness quality grading scale with mechanical criteria
- references/setup-patterns.md — boot, smoke, e2e, observability, and isolation patterns
- references/industry-examples.md — external patterns and justification for harness investment
More from uinaf/skills
verify
Self-check your own completed change before handing off to `review` — the pre-review sanity pass. Use when you want to verify your change, test it end-to-end, run the repo's guardrails (lint, typecheck, tests, build), exercise the real surface with evidence, and catch obvious self-correctable issues you can fix in seconds. Produces a `ready for review` / `needs more work` / `blocked` verdict — never a ship decision (that's `review`'s job). If the repo cannot be booted or exercised reliably, hand off to `agent-readiness`. If you are auditing someone else's diff, branch, or PR, use `review` instead.
37effect-ts
Implement, debug, refactor, migrate, review, or explain Effect TypeScript code. Use when a task touches `effect` or `@effect/*` APIs, especially services, layers, schemas, runtime wiring, platform or CLI packages, Effect testing, or Promise-to-Effect migration.
37docs
Update repo documentation and agent-facing guidance such as AGENTS.md, README.md, docs/, specs, plans, and runbooks. Use when code, skill, or infrastructure changes risk doc drift or when documentation needs cleanup or restructuring. Do not use for code review, runtime verification, or `agent-readiness` setup.
36viteplus
Migrate or align frontend repositories to the stock VitePlus workflow. Use when standardizing package or monorepo repos around `vp`, `voidzero-dev/setup-vp`, `vite-plus/test`, and VitePlus-native CI, test, packaging, and hook flows. Default to replacing direct package-manager and Vitest wiring with the VitePlus equivalents unless the repo has a proven exception.
29review
Independently audit existing code, diffs, branches, or pull requests using concern-specific reviewer personas and evidence. Use when triaging risk in a PR, deciding whether a change is safe to ship, or following up on a `verify` pass to make the call the builder cannot make on their own work. Produces a `ship it` / `needs review` / `blocked` verdict. Do not use to self-check a change you just authored; use `verify` for that.
23agent-readiness
Audit and build the infrastructure a repo needs so agents can work autonomously — boot scripts, smoke tests, CI/CD gates, dev environment setup, observability, and isolation. Use when a repo can't boot, tests are broken or missing, there's no dev environment, agents can't verify their work, or agents need human help to get anything done. Do not use for reviewing an existing diff or for documentation-only cleanup.
21