Harness Engineering

Core Principle

The repo is the harness. Agent failures are harness failures. When an agent breaks a rule, fix the harness — not the agent.

Pillar	What It Solves
Context Engineering	Agent hallucinations, wrong tool usage, stale docs
Architectural Constraints	Boundary violations, silent regressions, ambiguous failures
Garbage Collection	Entropy accumulation, dead code, doc drift

When to Apply

Situation	Action
New repo setup for AI development	Apply all three pillars from day one
Agent repeatedly hallucinating tools/APIs	Pillar 1: add tool declarations to AGENTS.md
Agent crossing module boundaries	Pillar 2: add structural test enforcing boundary
Agent producing code that passes CI but breaks conventions	Pillar 2: convert convention to linter rule or structural test
Docs drifting from implementation	Pillar 1: add CI cross-link validation
Codebase growing, agent quality degrading	Pillar 3: schedule GC agent for dead code and unused exports
Agent ignoring AGENTS.md guidance	Check: is guidance generic advice or a specific failure lesson? Rewrite as failure ledger entry
Post-incident on agent-generated code	Add failure to AGENTS.md, add constraint to prevent recurrence

Pillar 1: Context Engineering

Goal: Make the repository a knowledge product that agents can consume without hallucination.

1.1 AGENTS.md as Failure Ledger

Every line in AGENTS.md should trace to a real failure, not generic best practice.

# Pattern
BAD:  "Follow clean code principles"
BAD:  "Use meaningful variable names"
GOOD: "Never import from packages/internal — agent imported shared/db directly on 2025-12-03, broke build"
GOOD: "Always use OrderService.create(), not Order.new — direct instantiation skips validation (incident #247)"

Failure ledger entry format:

rule: Never call PaymentGateway directly from controllers
context: Agent bypassed service layer, sent duplicate charges (2025-11-15)
fix: Use PaymentService.charge() which handles idempotency

When writing or updating AGENTS.md:

If the rule doesn't reference a specific failure or concrete constraint → cut it
If the rule says "should" → rewrite as "must" with consequence
If the rule has no enforcement mechanism → pair it with a Pillar 2 constraint

1.2 Tool Declaration Mandate

Undeclared tools don't exist for agents. Explicitly list available tools, commands, and scripts.

## Available Tools
- `bin/test` — Run test suite (prefer over raw pytest/rspec)
- `bin/lint` — Run linters with auto-fix
- `bin/db-reset` — Reset dev database
- `make deploy-staging` — Deploy to staging (requires approval)

## DO NOT USE
- `rm -rf` on any directory
- Direct database queries in production
- `curl` to external APIs without going through ApiClient

If an agent uses a tool not in this list → add it (if valid) or add it to DO NOT USE (if dangerous).

1.3 Docs as System of Record

RULE: docs/ is canonical truth
ENFORCEMENT: CI validates cross-links between docs/ and source code
MECHANISM:
  - Every public API must have a corresponding docs/ entry
  - CI script checks: for each @api-doc tag in source → matching file in docs/
  - Broken link = CI failure, not warning

Stale docs are worse than no docs — agents trust what they read.

1.4 Isolated Observability Per Task

Each agent task gets its own log context, not a shared monitoring stream.

PATTERN:
  - Assign task_id to each agent invocation
  - Route logs to task-specific output (file, log group, trace)
  - Post-task: review task log for failures, add to AGENTS.md if new pattern

Shared monitoring hides individual agent failures in noise.

Pillar 2: Architectural Constraints

Goal: Make violations impossible or immediately visible through deterministic enforcement.

2.1 Teaching Linter Errors

Failure messages must include remediation, not just violation.

# BAD linter output
ERROR: Import violation in src/api/handler.ts

# GOOD linter output
ERROR: Import violation in src/api/handler.ts
  ↳ Cannot import from 'packages/db' in 'src/api/'
  ↳ Use 'packages/db-client' instead (facade pattern)
  ↳ See: docs/architecture/data-access.md

Agents read error messages literally. A teaching error message prevents the same mistake on next attempt.

2.2 Structural Tests

Enforce architectural boundaries in CI, not in documentation.

# Pseudocode structural tests
test "no cross-boundary imports":
  For each FILE in src/api/**:
    IMPORTS = parse_imports(FILE)
    FORBIDDEN = ["packages/internal", "src/admin", "src/worker"]
    assert IMPORTS intersection FORBIDDEN == empty

test "dependency direction":
  LAYERS = [presentation, application, domain, infrastructure]
  For each LAYER in LAYERS:
    For each IMPORT in LAYER.imports:
      assert IMPORT.layer_index >= LAYER.index  # only import same or lower

test "API contract stability":
  CURRENT = parse_openapi("api/openapi.yml")
  PREVIOUS = parse_openapi("api/openapi.yml", ref="main")
  assert no_breaking_changes(CURRENT, PREVIOUS)

2.3 Numeric CI Gates

Every check must be binary pass/fail. Advisory warnings are invisible to agents.

Check Type	Gate Implementation
Test coverage	`coverage >= 80%` or fail
Bundle size	`size <= 250KB` or fail
Lint errors	`errors == 0` or fail
Type errors	`errors == 0` or fail
Security vulns	`critical == 0` or fail

Rule: If it matters, it's a gate. If it's a warning, agents will ignore it.

2.4 Dependency Layering

Make dependency direction explicit and enforced.

# .dependency-layers.yml (or equivalent config)
layers:
  - name: presentation
    paths: ["src/ui/**", "src/api/routes/**"]
    can_import: [application, domain]

  - name: application
    paths: ["src/services/**", "src/use-cases/**"]
    can_import: [domain, infrastructure]

  - name: domain
    paths: ["src/models/**", "src/entities/**"]
    can_import: []  # domain has no dependencies

  - name: infrastructure
    paths: ["src/db/**", "src/external/**"]
    can_import: [domain]

Without explicit layers, agents will create circular dependencies.

Pillar 3: Garbage Collection

Goal: Actively fight entropy rather than accumulating technical debt.

3.1 Background GC Agents

Schedule periodic scans that produce small, auto-mergeable PRs.

GC_TASKS:
  - dead_code: find unused exports, unreachable functions → remove
  - stale_docs: find docs/ entries with no matching source → flag
  - unused_deps: find package.json/Gemfile entries with no imports → remove
  - orphan_tests: find test files with no matching source file → flag
  - config_drift: diff .env.example vs actual config usage → reconcile

FREQUENCY: weekly for active repos, monthly for stable repos
OUTPUT: one small PR per GC task (not one mega-PR)
MERGE: auto-merge if CI passes, otherwise flag for review

3.2 Custom Verification Tools

Build repo-specific fast-feedback tools instead of relying on generic linters.

PRINCIPLE: fast feedback > comprehensive analysis

Examples:
  - bin/check-api-contracts → validates OpenAPI spec matches routes (5s)
  - bin/check-imports → validates dependency layers (2s)
  - bin/check-docs → validates doc cross-links (3s)

RULE: if a custom check takes > 30s, it's too slow for agent feedback loops

3.3 Feedback Loop Discipline

When an agent produces bad output, follow this diagnostic order:

1. Check AGENTS.md — is the constraint documented?
   If NO → add it (Pillar 1 fix)
   If YES → is it enforced in CI?

2. Check CI — does a gate catch this failure?
   If NO → add structural test or linter rule (Pillar 2 fix)
   If YES → is the error message teaching?

3. Check error message — does it include remediation?
   If NO → improve the error message (Pillar 2.1 fix)
   If YES → the harness is correct, investigate agent-specific issue

NEVER: Skip to "the agent is broken" without completing steps 1-3

Harness Assessment Checklist

Evaluate an existing repo's harness maturity:

Context Engineering

AGENTS.md exists at repo root
AGENTS.md entries reference specific failures, not generic advice
Available tools/scripts are explicitly listed
Forbidden operations are explicitly listed
docs/ has CI-enforced cross-links to source

Architectural Constraints

Linter errors include remediation instructions
At least one structural test enforces module boundaries
All CI checks are binary pass/fail (no advisory warnings)
Dependency direction is documented and enforced

Garbage Collection

Dead code removal happens on a schedule (not ad-hoc)
Stale docs are detected automatically
Custom verification tools exist for repo-specific invariants
Post-failure diagnosis follows harness-first order (3.3)

Scoring: Count passing items per pillar. Below 3/5 in any pillar = harness gap.

Cross-References

hierarchical-agents — AGENTS.md structure, generation process, JIT indexing
tdd-workflow — structural test implementation via red-green-refactor
structured-logging — observability patterns for task-level isolation
quality-gate agent — CI gate execution mechanics
references/patterns-catalog.md — topology templates, AGENTS.md patterns, structural test examples, GC agent templates

harness-engineering