Cavekit Methodology

Core Principle: Specify Before Building

Always define what you want before telling agents how to build it. Go through a cavekit stage — never jump straight from raw requirements to implementation.

Cavekit is a methodology for building software with AI coding agents that puts kits at the center of the development process — code is derived from them, not the other way around. Whether starting from scratch or modernizing an existing system, the principle is the same:

Greenfield projects: reference material → kits → code
Rewrites: old code → kits → new code

In both cases, the kits become a living contract that agents consume to continuously build, validate, and refine the application.

Why Kits Are the First-Class Citizen

Property	Benefit
Structured	Organized as a navigable tree, enabling agents to load only what they need
Human-legible	Engineers can audit requirements at a higher level than code
Stack-independent	Decoupled from any single framework or language
Independently evolvable	Kits can be refined without touching implementation
Verifiable	Every requirement includes acceptance criteria agents can check

Key Insight: Well-written kits with strong validation make your application reproducible — any agent can rebuild it from the kits alone. Think of it as continuous regeneration.

The Scientific Method Analogy

LLMs are inherently non-deterministic — like running an experiment, each individual call may yield different results. But through the right methodology — clear hypotheses, controlled conditions, and repeated trials — we extract reliable, reproducible outcomes from a stochastic process.

Cavekit applies the scientific method to software construction — hypothesize, test, observe, refine.

Layer	Analogy	What It Does
LLM calls	Individual experiments	Each run may produce different results; no single output is authoritative
Kits	Hypotheses	Define what you expect to observe — the predicted behavior
Validation gates	Controlled conditions	Ensure reproducibility by constraining what counts as a valid outcome
Convergence loops	Repeated trials	Build statistical confidence through successive passes
Implementation tracking	Lab notebook	Record what was tried, what worked, and what failed
Revision	Revising the hypothesis	When results contradict expectations, update the theory upstream

The outcome: a disciplined, repeatable engineering process layered on top of probabilistic generation.

The 5 Hunt Phases

The Hunt is the four-phase lifecycle: Sketch, Map, Make, Check. Each phase has dedicated prompts that drive it.

Phase	Input	Output	AI Role	Human Role
Draft	Source materials, domain knowledge, existing systems	Implementation-agnostic kits	Extract requirements, structure knowledge	Verify kits capture intent accurately
Architect	Kits + framework research	Framework-specific implementation plans	Design architecture, break down work, order steps	Approve architectural choices
Build	Plans + kits	Working code + tests + tracking docs	Write code, run tests, check against kits	Watch for drift and blockers
Inspect	Failed validations, gaps, manual fixes	Updated kits/plans via revision	Identify root causes, propagate fixes upstream	Evaluate outcomes, set priorities
Monitor	Running application, git history	Issues, anomalies, progress reports	Scan for regressions, surface metrics	Interpret reports, guide next steps

Phase Transitions

Each phase has gate conditions that must be met before moving to the next:

Draft → Architect: All domains have kits with testable acceptance criteria. Human has reviewed for completeness.
Architect → Build: Plans reference kits, define implementation sequence, and include test strategies. Architecture decisions validated.
Build → Inspect: Code builds, tests pass at current coverage level, implementation tracking is up to date.
Inspect → Monitor: Convergence detected (changes decreasing iteration-over-iteration). Remaining changes are trivial.
Monitor → Draft (cycle): Gap found or new requirement identified. Revise kits and restart the cycle.

The Inspect phase is where the human serves as reviewer and decision-maker, not hands-on coder. You monitor the process, request changes as needed, and make systemic improvements to kits and prompts.

For the full Hunt phase reference, see references/hunt-phases.md.

Decision Matrix: When to Use Cavekit

Full Cavekit

Use when the project has significant scope, evolving requirements, or needs autonomous agent execution.

Indicator	Threshold
Codebase size	50+ source files
Requirements	Evolving, multi-domain
Agent coordination	Multi-agent or multi-prompt pipelines
Environment	Production, security-sensitive, brownfield
Team structure	Multi-team or cross-team
Execution mode	Long-running autonomous work (overnight, unattended)

What you get: Full Hunt lifecycle, context directory with kits/plans/impl tracking, prompt pipeline, convergence loops, revision, validation gates.

Lightweight Cavekit

Use when scope is moderate — too complex for ad-hoc but not worth a full pipeline.

Indicator	Threshold
Codebase size	5-50 files
Requirements	Mostly clear, focused
Agent coordination	Single agent, possibly with sub-agents
Execution mode	Interactive with occasional iteration loops

What you do:

Write a focused context/kits/cavekit-task.md capturing requirements
Add a context/plans/plan-task.md sequencing the implementation
Skip full Hunt — just run an iteration loop against the plan

This is the "Cavekit floor" — most of the benefit without the overhead of a full multi-phase pipeline.

Skip Cavekit

Use when the task is trivially small.

Indicator	Threshold
Codebase size	Less than 5 files
Task type	One-off tools, simple bug fixes, exploratory prototypes
Implementation	Fits comfortably in one agent session without needing external references

Heuristic: If the whole task fits in one context window with room to spare, full Cavekit adds more overhead than value.

Growth Path

Start with lightweight Cavekit even if the project is small. If the scope expands, you already have the structure in place to scale up. It is much harder to retrofit kits onto a large codebase than to grow a cavekit directory from the beginning.

The CI Pipeline Analogy

Cavekit mirrors a build pipeline — each stage transforms input into validated output, with feedback loops that propagate corrections upstream:

Traditional CI/CD:
  Code → Build → Test → Deploy

Cavekit AI Pipeline:
  Cavekit Change
    → Generate Plans (iteration loop)
    → Generate Implementation (iteration loop)
    → Validate (Tests + Review)
    → Human Audit (Monitor & Steer)
    → [Gap Found]
    → Revise
    → Cavekit Change (cycle repeats)

Every stage can run as an iteration loop — the same prompt executed repeatedly until output stabilizes. The iteration loop is what transforms nondeterministic LLM output into predictable, validated software.

The Iteration Loop

The iteration loop is the fundamental execution unit in Cavekit. Execute the same prompt against the same codebase multiple times until the delta between runs approaches zero.

Mechanics:

Execute a prompt against the current codebase
The agent inspects git history and tracking documents to understand what has already been done
The agent applies changes and commits its progress
Return to step 1

Convergence signal: A shrinking volume of modifications across successive passes — the diff gets smaller each time until only cosmetic changes remain. You are looking for diminishing returns, not absolute zero.

When the loop isn't stabilizing, the problem is upstream — fix the inputs (specs, validation, coordination), not the iteration count.

If the diff is not shrinking between runs:

Kits are ambiguous (agents interpret them differently each time)
Validation criteria are too loose (the agent has no way to confirm it got things right)
Multiple agents are overwriting each other's work (ownership boundaries are unclear)

Cross-References to Sub-Skills

Cavekit is composed of techniques that work together. This methodology skill is the index — each sub-skill below is self-contained but cross-references others.

Foundation Skills

Skill	Purpose	When to Use
`ck:cavekit-writing`	Write implementation-agnostic kits with testable acceptance criteria	Draft phase — always the first step
`ck:context-architecture`	Organize context for progressive disclosure	Project setup and ongoing maintenance
`ck:impl-tracking`	Track implementation progress, dead ends, test health	Build and Inspect phases
`ck:validation-first`	Design validation gates agents can execute	All phases — validation is continuous

Pipeline Skills

Skill	Purpose	When to Use
`ck:prompt-pipeline`	Design numbered prompt pipelines for the Hunt	Setting up automation
`ck:revision`	Trace bugs back to kits and fix at the source	Inspect phase — after finding gaps
`cavekit:brownfield-adoption`	Adopt Cavekit on existing codebases	Starting Cavekit on legacy projects

Advanced Skills

Skill	Purpose	When to Use
`ck:peer-review`	Use a second agent to challenge the first	Quality gates, architecture review
`cavekit:speculative-pipeline`	Stagger pipeline stages for parallelism	Optimizing long pipelines
`ck:convergence-monitoring`	Detect convergence vs ceiling	Monitoring iteration loops
`cavekit:documentation-inversion`	Turn documentation into agent-consumable skills	Library/module documentation

Integration with Existing Skills

Cavekit works with existing skills, not as a replacement:

Existing Skill	Cavekit Integration
`superpowers:brainstorming`	Use during cavekit generation to explore requirements
`superpowers:writing-plans`	Use during plan generation for structured planning
`superpowers:test-driven-development`	TDD-within-Cavekit: cavekit acceptance criteria become failing tests
`superpowers:verification-before-completion`	Use for gate validation in every phase
`superpowers:executing-plans`	Use during implementation phase
`superpowers:dispatching-parallel-agents`	Use for agent team coordination

Quick Start

For a New Project (Greenfield)

Set up context directory:

context/
├── refs/           # Source materials (PRDs, language specs, research)
├── kits/     # Implementation-agnostic kits
├── plans/          # Framework-specific implementation plans
├── impl/           # Living implementation tracking
└── prompts/        # Hunt pipeline prompts

Write kits from your reference materials (see ck:cavekit-writing)
Generate plans from kits (see ck:prompt-pipeline)
Implement with validation gates (see ck:validation-first)
Track progress in implementation documents (see ck:impl-tracking)
Iterate — when gaps are found, revise kits (see ck:revision)

For an Existing Project (Brownfield)

Set up context directory (same structure as above)
Designate existing codebase as reference material
Generate kits from code (see cavekit:brownfield-adoption)
Validate kits match behavior — run tests against generated kits
Proceed with normal Hunt — future changes flow through kits first

Summary

Cavekit is not a tool — it is a methodology. The core loop is simple:

Describe what you want (kits with testable criteria)
Let agents build it (plans → implementation → validation)
Fix the kits, not the code (revision)
Repeat until converged (iteration loops)

Agents become more capable the more precisely you constrain them — clear kits, automated validation, and structured iteration loops let them operate with increasing autonomy. None of this eliminates the need for software engineers. Your judgment on architecture, your ability to write precise kits, and your instinct for what "done" looks like are the inputs that make the whole system function. Cavekit is a force multiplier: one engineer's clarity of thought, scaled across an entire implementation pipeline.