validation-first
Validation-First Design
Core Principle: If an Agent Cannot Validate It, It Will Not Be Met
Every spec requirement must include testable acceptance criteria that an agent can automatically verify. This is not optional — it is the foundation that makes SDD work.
Why? AI agents are non-deterministic. Without automated validation, there is no way to know whether an agent's output is correct. Validation gates turn "the agent generated some code" into "the agent generated code that provably meets the specification."
The validation-first rule applies at every level:
- Spec requirements must have testable acceptance criteria
- Plans must define which gates verify each task
- Implementation must pass all applicable gates before being considered complete
- Iterations must show measurable progress through gates
The Validation Gate Sequence
Every implementation must pass through six ordered checkpoints. Each successive gate is more expensive to run, so catching failures early saves significant time.
Gate 1: Compilation Check
What: The project compiles/transpiles without errors.
# Generic pattern — substitute your project's build command
{BUILD_COMMAND}
Why it matters: If the code does not build, nothing else can be validated. This is the cheapest possible check.
What it catches:
- Syntax errors
- Missing imports/dependencies
- Type errors (in typed languages)
- Configuration errors
Acceptance criteria pattern:
- [ ] `{BUILD_COMMAND}` completes with exit code 0
- [ ] No warnings related to {domain} (warnings in other domains are acceptable)
Gate 2: Isolated Unit Verification
What: Unit tests pass on all changed files.
# Generic pattern
{TEST_COMMAND}
# Or targeted at changed files
{TEST_COMMAND} --filter {changed-files}
Why it matters: Unit tests verify individual functions and modules in isolation. They are fast, deterministic, and catch logic errors.
What it catches:
- Incorrect function behavior
- Edge cases not handled
- Regression from changes to existing code
- Contract violations (wrong return types, missing fields)
Acceptance criteria pattern:
- [ ] All existing unit tests pass
- [ ] New unit tests cover all acceptance criteria for R{N}
- [ ] No test relies on external services or network access
Gate 3: Cross-Component Integration
What: End-to-end and integration tests verify that components work together.
# Generic pattern
{TEST_COMMAND} --e2e
# Or with a specific test runner
{E2E_TEST_COMMAND}
Why it matters: Unit tests verify components in isolation. Integration tests verify they work together. Many bugs only appear at integration boundaries.
What it catches:
- API contract mismatches between components
- Data flow errors across module boundaries
- Authentication/authorization integration issues
- Database query errors with real (or realistic) data
Acceptance criteria pattern:
- [ ] User can complete {workflow} end-to-end
- [ ] API endpoint returns correct response for {scenario}
- [ ] Error propagation works correctly from {source} to {destination}
Gate 4: Resource and Speed Benchmarks
What: Performance benchmarks pass defined thresholds.
# Generic pattern
{BENCHMARK_COMMAND}
# Or specific checks
{TEST_COMMAND} --performance
Why it matters: Functional correctness is necessary but not sufficient. Performance regression can make a feature unusable even if it produces correct output.
What it catches:
- Response time regression
- Memory leaks or excessive allocation
- CPU-intensive operations that block the main thread
- Database query performance degradation
Acceptance criteria pattern:
- [ ] API response time < {N}ms at p95 under {M} concurrent users
- [ ] Page load time < {N}s on simulated 3G connection
- [ ] Memory usage does not exceed {N}MB during {operation}
- [ ] No operation blocks the main thread for > {N}ms
Note: Not every task needs performance gates. Apply Gate 4 when:
- The spec explicitly defines performance requirements
- The change touches a known hot path
- The feature involves data processing at scale
Gate 5: Startup Smoke Test
What: The application starts successfully and basic smoke tests pass.
# Generic pattern — start the application
{START_COMMAND}
# Verify it is running
curl -f http://localhost:{PORT}/health
# Or run smoke tests
{SMOKE_TEST_COMMAND}
Why it matters: Code can build and pass all tests but fail to start. Launch verification catches configuration issues, missing environment variables, port conflicts, and startup race conditions.
What it catches:
- Missing environment variables or configuration
- Port conflicts or binding errors
- Startup initialization failures
- Missing runtime dependencies
- Database migration issues
Acceptance criteria pattern:
- [ ] Application starts with `{START_COMMAND}` and responds to health check
- [ ] Main screen/page renders without errors
- [ ] No error-level entries in application logs during startup
- [ ] Application shuts down gracefully on interrupt signal
Gate 6: Manual Audit
What: A human reviews the output for quality, design intent, and requirements that are difficult to automate.
Why it matters: Some things cannot be automated — UX quality, architectural elegance, naming consistency, documentation clarity. Gate 6 is where the human acts as the final quality filter.
What it catches:
- Subjective quality issues
- Architectural decisions that are technically correct but strategically wrong
- Over-engineering or under-engineering
- Security concerns that automated tools miss
- Requirements that were technically met but miss the spirit of the spec
How it works in practice:
- Agent completes Gates 1-5 and reports results
- Human reviews the implementation against spec intent
- Human provides feedback as issues or spec updates
- If feedback requires code changes, it enters the revision loop:
- Update specs with the missing requirement
- Re-run iteration loop
- Verify the fix emerges from updated specs
Acceptance criteria pattern:
- [ ] Implementation reviewed by human for spec intent alignment
- [ ] No architectural concerns raised
- [ ] Code style consistent with project conventions
Gate Summary Table
| Gate | Purpose | Command Pattern | Typical Duration | Automated? |
|---|---|---|---|---|
| 1. Compilation | Code compiles cleanly | {BUILD_COMMAND} |
Seconds | Yes |
| 2. Unit Verification | Individual functions behave correctly | {TEST_COMMAND} |
Seconds-Minutes | Yes |
| 3. Integration | Modules cooperate as expected | {E2E_TEST_COMMAND} |
Minutes | Yes |
| 4. Benchmarks | Speed and resource use within budget | {BENCHMARK_COMMAND} |
Minutes | Yes |
| 5. Smoke Test | Application boots and responds | {START_COMMAND} + health check |
Seconds | Yes |
| 6. Manual Audit | Meets design intent and quality bar | Human inspection | Variable | No |
For the full validation gate reference with detailed examples, see
references/validation-gates.md.
Mapping Spec Requirements to Gates
Every spec requirement must map to at least one validation gate. When writing specs (see ck:cavekit-writing), each acceptance criterion should indicate which gate verifies it.
Mapping Pattern
### R1: User Authentication
**Acceptance Criteria:**
- [ ] Valid credentials return session token — **Gate 2** (unit test)
- [ ] Invalid credentials return 401 error — **Gate 2** (unit test)
- [ ] Session token grants access to protected endpoints — **Gate 3** (integration)
- [ ] Login page renders within 2s — **Gate 4** (performance)
- [ ] Application starts with auth module loaded — **Gate 5** (launch)
Unmapped Requirements Are Unvalidated
If a requirement cannot be mapped to any gate, it has one of two problems:
- The requirement is too vague — rewrite it with specific, testable criteria
- The validation infrastructure is missing — add the gate (e.g., if there are no E2E tests, Gate 3 does not exist yet)
Either way, an unmapped requirement will not be reliably met by an agent.
Phase Gates Between Hunt Phases
Phase gates are mandatory verification checkpoints between Hunt phases. They ensure that the output of one phase is solid before the next phase builds on it.
Phase Gate Definitions
| Transition | Gate Condition | How to Verify |
|---|---|---|
| Spec → Plan | All domains have specs with testable acceptance criteria | Review cavekit-overview.md; every R{N} has AC items |
| Plan → Implement | Plans reference specs, define sequence, include test strategies | Review plan files; every task maps to spec requirements |
| Implement → Iterate | Code builds (Gate 1), tests pass (Gate 2), impl tracking is current | Run {BUILD_COMMAND} and {TEST_COMMAND}; check impl tracking |
| Iterate → Monitor | Convergence detected: changes decreasing iteration-over-iteration | Compare diffs across last 3-5 iterations |
| Monitor → Spec | Gap found or new requirement identified | Gap analysis identifies unmet acceptance criteria |
Phase Gate Enforcement
Phase gates are enforced by the iteration loop. When a prompt includes phase gate checks, the agent:
- Runs the gate check at the end of the phase
- Reports pass/fail status
- If the gate fails, the agent does not proceed to the next phase
- Instead, the agent iterates on the current phase until the gate passes
Example: Implement → Iterate Gate
## Exit Criteria (Phase Gate)
Before reporting completion:
- [ ] `{BUILD_COMMAND}` succeeds with exit code 0
- [ ] `{TEST_COMMAND}` passes with no new failures
- [ ] All files created/modified are listed in impl tracking
- [ ] All dead ends encountered are documented
- [ ] Test health table is updated with current counts
Merge Protocol
When working with agent teams (multiple agents dispatched via the Agent tool), the merge protocol ensures that integrating work from different agents does not break validation gates.
The Protocol
Agent A completes work in its isolated branch
Agent B completes work in its isolated branch
Agent C completes work in its isolated branch
Merge sequence (one at a time):
1. Merge Agent A's branch → main
2. Run: {BUILD_COMMAND} → must pass
3. Run: {TEST_COMMAND} → must pass
4. Run: Launch verification → must pass
5. If all pass → proceed
6. If any fail → fix before merging next branch
7. Merge Agent B's branch → main
8. Run: {BUILD_COMMAND} → must pass
9. Run: {TEST_COMMAND} → must pass
10. ...repeat for each agent branch
Why One at a Time?
Merging all agent branches simultaneously and then running tests makes it impossible to determine which merge caused a failure. Merging one at a time with validation between each merge pinpoints failures immediately.
Merge Protocol Rules
- Merge one agent branch at a time — never batch-merge
- Run Gates 1-3 after each merge — build, unit tests, integration tests
- Run Gate 5 after all merges — launch verification on the fully integrated codebase
- Clean up after each merge: delete the merged branch (
git branch -D <branch>). - If a merge fails validation, fix it before proceeding to the next merge
Completion Signals
Completion signals are specific strings that agents emit when all exit criteria for a task or phase are met. They enable automation to detect when an agent is done.
How Completion Signals Work
-
The prompt defines the signal:
When ALL exit criteria are met, output exactly: <all-tasks-complete> -
The agent emits the signal after verifying all exit criteria
-
The iteration loop detects the signal and stops iterating
Completion Signal Rules
- The signal must be a unique string that would not appear in normal output
- The agent must verify all exit criteria before emitting the signal
- The signal should be the last thing the agent outputs in a session
- If the agent cannot meet all criteria, it should not emit the signal and instead document what is blocking completion
Example Prompt with Completion Signal
## Exit Criteria
Complete all of the following before emitting the completion signal:
- [ ] All T- tasks are DONE or BLOCKED with documented blockers
- [ ] `{BUILD_COMMAND}` succeeds
- [ ] `{TEST_COMMAND}` passes with no new failures
- [ ] Implementation tracking is updated
- [ ] All dead ends are documented
When ALL criteria above are met, output:
<all-tasks-complete>
If you cannot meet all criteria, document what is blocking
and do NOT output the completion signal.
Validation-First Design Patterns
Pattern 1: Test Before Implement
Write or generate tests before implementing the feature. The test defines what "correct" means.
1. Read spec requirement R{N} acceptance criteria
2. Generate test cases that verify each criterion
3. Run tests → all fail (RED)
4. Implement the feature
5. Run tests → all pass (GREEN)
6. Refactor if needed
This is TDD-within-SDD. See superpowers:test-driven-development for the existing TDD skill.
Pattern 2: Gate Cascade
Run gates in order. If an earlier gate fails, do not run later gates.
Gate 1 (Build) → FAIL → fix build errors → retry Gate 1
Gate 1 (Build) → PASS → Gate 2 (Unit Tests)
Gate 2 (Tests) → FAIL → fix failing tests → retry Gate 2
Gate 2 (Tests) → PASS → Gate 3 (Integration)
...
Earlier gates are cheaper. Fixing a build error costs seconds. Fixing an integration error costs minutes. Fix cheap problems first.
Pattern 3: Progressive Gate Depth
Not every iteration needs all gates. Use progressive depth based on the phase:
| Phase | Required Gates | Optional Gates |
|---|---|---|
| Early Implement | 1 (Build), 2 (Unit) | — |
| Mid Implement | 1, 2, 3 (Integration) | 4 (Performance) |
| Late Implement | 1, 2, 3, 5 (Launch) | 4 |
| Pre-Release | All 1-6 | — |
Pattern 4: Regression Prevention
When a gate that previously passed starts failing, treat it as a P0 issue:
- Stop forward progress — do not implement new features
- Identify the regression — which change caused the failure?
- Fix the regression — restore the passing state
- Add a test — ensure this specific regression cannot recur
- Backpropagate — if the regression reveals a spec gap, update the spec
Integration with Other Skills
With superpowers:verification-before-completion
The existing verification-before-completion skill provides a general framework for verifying work before marking it done. Validation-first design extends this with the specific 6-gate pipeline and phase gate system used in SDD.
How they work together:
superpowers:verification-before-completionensures the agent checks its workck:validation-firstdefines exactly what checks to run and in what order
With ck:cavekit-writing
Every spec requirement must have acceptance criteria that map to validation gates. The spec-writing skill defines how to write those criteria. Validation-first design defines how to verify them.
With ck:impl-tracking
Validation results are recorded in the implementation tracking document's Test Health table. Gate failures become Issues. Gate-related dead ends are documented in the Dead Ends section.
With ck:methodology
Validation gates operate continuously across all Hunt phases. Phase gates control transitions between phases. The iteration loop uses gate results as convergence signals.
Summary
- If an agent cannot validate it, it will not be met — every requirement needs automated verification
- 6 gates in order: Compilation → Unit Verification → Integration → Benchmarks → Smoke Test → Manual Audit
- Earlier gates are cheaper — catch problems at the build stage, not at launch
- Every spec requirement maps to at least one gate — unmapped requirements are unvalidated
- Phase gates control Hunt transitions — do not proceed until the current phase passes its gate
- Merge one at a time — validate between each merge to pinpoint failures
- Completion signals enable automation — agents emit a specific string when all gates pass
- Regression is P0 — when a passing gate starts failing, stop and fix before proceeding