Iterative Fleet

A skill for reviewer-gated iterative loops of parallel claude -p or codex exec workers. Workers run per-iteration, a reviewer reads their output and writes a verdict, and a generated orchestrator decides whether to continue or stop — without ever killing or restarting workers. Supports both Claude and Codex providers — set per-fleet or per-worker. See dag-fleet SKILL.md for full codex provider documentation (model aliases, reasoning_effort, limitations).

When to use this skill

Reach for iterative-fleet when:

The work needs multiple rounds of refinement (not one-shot)
A reviewer/verifier must approve output before the work is done
You want operator-declared stop conditions (max iterations, LGTM count, cost cap)
Workers are long-running or have high bootstrap costs (no auto-restart — see CRITICAL section)

Use dag-fleet instead when:

Workers run once and are done (no iteration needed)
There is no reviewer quality gate

Use worktree-fleet instead when:

Tasks are fully independent with no shared state

CRITICAL: No auto-kill, no auto-restart

The orchestrator generated by this skill reads and decides only. It NEVER kills or restarts workers. Workers run to natural completion per iteration. This design comes from a $20 death spiral in experiment 001 where auto-restart on "stuck" workers caused cache rebuilds that cost more than the actual work. The reviewer is the quality gate. The operator is the kill switch.

fleet.json schema

{
  "fleet_name": "my-iterative-fleet",
  "type": "iterative",
  "config": {
    "max_concurrent": 3,
    "model": "sonnet",
    "fallback_model": "haiku",
    "max_iterations": 10,
    "cost_cap_usd": 10.0
  },
  "workers": [
    { "id": "builder-a", "type": "code-run", "task": "...", "max_budget_per_iter": 1.0 },
    { "id": "builder-b", "type": "code-run", "task": "...", "max_budget_per_iter": 1.0 },
    { "id": "reviewer",  "type": "reviewer", "task": "Review output, write verdict: lgtm | iterate | escalate",
      "depends_on": ["builder-a", "builder-b"] }
  ],
  "stop_when": {
    "reviewer_lgtm_count": 3,
    "max_iterations": 10,
    "cost_cap_usd": 10.0
  }
}

DAG ordering via `depends_on`

Each iteration is a DAG. Workers declare dependencies with depends_on:

{ "id": "reviewer", "type": "reviewer", "depends_on": ["builder-a", "builder-b"] }

At launch, only layer-0 workers (no deps) are spawned. The orchestrator spawns subsequent layers after their dependencies complete, then repeats the DAG on the next iteration.

iteration 1:  [builder-a, builder-b] → [reviewer] → verdict: iterate
iteration 2:  [builder-a, builder-b] → [reviewer] → verdict: lgtm → stop

Multi-layer DAGs work too (e.g., researcher → builder → reviewer = 3 layers). The shared lib/dag.sh provides topo-sort and dependency checking, reusable across all fleet types.

If no worker has depends_on, all workers launch in parallel (backward compatible).

Iteration directory structure

$FLEET_ROOT/
  fleet.json
  iterations/
    1/
      builder-a.log
      builder-b.log
      review.md          # reviewer writes verdict here
    2/
      ...
  workers/
    builder-a/
      prompt.md
      session.jsonl
    ...
  orchestrator.sh        # generated — reads logs, decides iterate/pause/stop
  .paused                # exists when paused (touch to pause, rm to resume)

Reviewer interface

The reviewer worker reads iterations/<N>/*.log and writes iterations/<N>/review.md. The review.md MUST contain one of:

verdict: lgtm — output is approved, count toward stop condition
verdict: iterate — needs another round
verdict: escalate — needs human attention, orchestrator pauses

Reviewer prompt requirements

Every reviewer prompt.md MUST include these instructions verbatim (with paths adjusted). Without them, the reviewer won't know where to write the verdict and the orchestrator defaults to iterate, wasting an iteration.

## Writing your verdict

1. Determine the current iteration number: list the `iterations/` directory and find the
   highest-numbered subdirectory that does NOT yet contain a `review.md`.
2. Write your verdict to `iterations/<N>/review.md` (relative to your working directory).
   **Never use absolute paths.**
3. The file MUST contain a line exactly like one of:
   - `verdict: lgtm`
   - `verdict: iterate`
   - `verdict: escalate`
4. Below the verdict line, list **actionable fix instructions** per worker — not just what's
   wrong, but exactly where and how to fix it (file path, function name, what to change).
   The builder sees this feedback on the next iteration, so vague issues waste a cycle.

Example `iterations/1/review.md`:

verdict: iterate

builder-a

src/parser.py:parse_input() — no try/except around JSON decode. Wrap lines 45-48 in try/except json.JSONDecodeError, return None on failure.
src/parser.py:validate_schema() — missing required field "timestamp" in schema dict at line 72. Add "timestamp": {"type": "string", "required": True}.

builder-b

src/utils.py — format_output() defined but not in __all__ or exported in __init__.py. Add to src/__init__.py line 5: from .utils import format_output.

Available scripts

Script	Purpose
`launch.sh <fleet-root> [--dry-run]`	Parse fleet.json, generate orchestrator.sh, spawn workers + orchestrator in tmux
`status.sh <fleet-root>`	Show iteration count, reviewer verdict history, per-worker status, cost
`pause.sh <fleet-root>`	Touch `.paused` — orchestrator stops at next iteration boundary
`resume.sh <fleet-root>`	Remove `.paused` — orchestrator continues
`kill.sh <fleet-root> all [--force]`	Hard stop: kill tmux session, sweep orphans, unregister

Launch procedure

Create $FLEET_ROOT/fleet.json with workers including exactly one type: "reviewer" worker (with depends_on pointing to builder workers)
Create $FLEET_ROOT/workers/<id>/prompt.md for each worker
Run bash ${CLAUDE_SKILL_DIR}/scripts/launch.sh $FLEET_ROOT
ALWAYS tell the user the exact status command so they can monitor manually:
```
bash ${CLAUDE_SKILL_DIR}/scripts/status.sh $FLEET_ROOT
```
This is mandatory after every launch. The user must be able to check status without asking you.
Pause if needed: bash ${CLAUDE_SKILL_DIR}/scripts/pause.sh $FLEET_ROOT
Hard stop: bash ${CLAUDE_SKILL_DIR}/scripts/kill.sh $FLEET_ROOT all

Rationalizations to reject

Agent says	Rebuttal
"The worker has been running for 10 minutes with no output — it must be stuck, I should pause it"	Long thinking blocks look like silence. The orchestrator waits for `result` events, not timeouts. This is exactly the failure mode that caused the $20 death spiral in experiment 001. Do not intervene.
"The reviewer said 'looks mostly good' — I'll count that as LGTM"	Only `verdict: lgtm` counts. "Mostly good" is `iterate`. If the reviewer is ambiguous, the verdict is `iterate`. Do not interpret generously.
"I should kill this worker and restart it with a better prompt"	The orchestrator NEVER kills workers. Workers run to natural completion. If you need a different prompt, pause the fleet, edit prompt.md, and let the next iteration pick it up.
"The cost is getting high — I'll reduce max_iterations mid-run"	Stop conditions are baked into orchestrator.sh at generation time. To change them, kill the fleet, regenerate with new fleet.json, and relaunch. Do not edit orchestrator.sh directly.
"I can skip the reviewer for this simple task"	If the task doesn't need a reviewer, use dag-fleet (one-shot) or worktree-fleet (independent). Iterative-fleet without a reviewer is a runaway loop.

Decision tree: which fleet?

1. Tasks independent (no shared files/state)?  YES → worktree-fleet
2. Need iteration with reviewer quality gate?  YES → iterative-fleet (this skill)
3. One-shot DAG with dependencies?             YES → dag-fleet
4. None of the above?                          → open multiple Claude Code sessions

$ARGUMENTS

iterative-fleet