iterate
Iterate - Autonomous Improvement Loop
Inspired by Karpathy's autoresearch: constrain scope, clarify success with one mechanical metric, loop autonomously. The agent modifies code, measures the result, keeps improvements, discards regressions, and repeats - indefinitely or for N iterations.
The power is in the constraint. One metric. One scope. One loop. Git as memory.
Preflight
Before the loop starts, do the work that makes the loop effective. Don't skip steps - this discipline is what separates a productive overnight run from a flailing one.
1. Collect Config
Five inputs. If provided inline, extract and proceed. If any are missing, ask once using AskUserQuestion with all missing fields batched together.
| Field | Required | What it is | Example |
|---|---|---|---|
| Goal | Yes | What you're improving, in plain language | "Increase test coverage to 90%" |
| Scope | Yes | File globs the agent may modify | src/**/*.ts |
| Verify | Yes | Shell command that outputs the metric (a number) | npm test -- --coverage | grep "All files" |
| Direction | Yes | Is higher or lower better? | higher / lower |
| Guard | No | Command that must always pass (prevents regressions) | npm run typecheck |
Bounded mode: If the user includes Iterations: N, run exactly N iterations then stop with a summary. Otherwise, loop forever until interrupted.
2. Plan
Read all in-scope files. Understand the codebase before touching anything.
- What's the current state? What's already been tried?
- What are the likely improvement vectors? Rank them.
- What are the risks? What could break?
- Form a rough strategy for the first 5-10 iterations.
3. Permissions
Check that allowed-tools cover what the loop needs. The verify and guard commands must run without permission prompts - a blocked tool at 3am kills the whole run.
- Dry-run the verify command. If it gets blocked, note which
Bash(command:*)pattern is needed. - Dry-run the guard command (if set). Same check.
- If permissions are missing, suggest specific wildcard additions for
.claude/settings.local.jsonand ask the user to approve before starting. Reference/setpermsfor a full setup.
4. Tasks
Create a TaskList to track progress across iterations. This provides structure the user can check without reading the full results log.
TaskCreate: "Establish baseline" (status: in_progress)
TaskCreate: "Iteration loop - [goal]" (status: pending)
TaskCreate: "Final summary and cleanup" (status: pending)
Update task status as the loop progresses. Mark the iteration task as in_progress when the loop starts, completed when it ends.
5. Tests and Verification
Before the first iteration, make sure verification actually works:
- Run the verify command on the current state. If it fails or produces no parseable number, fix this first.
- Run the guard command (if set). If it fails on the current state, the codebase has pre-existing issues - flag to the user.
- If tests don't exist yet for the scope, consider writing them as iteration 0. Good tests make the loop more effective.
6. Baseline
Record the starting point:
- Run verify command, extract the metric - this is iteration 0
- Create
results.tsvwith the header and baseline row - Update the baseline task to
completed - Confirm setup to the user, then begin the loop
Goal: Increase test coverage to 90%
Scope: src/**/*.ts
Verify: npm test -- --coverage | grep "All files"
Direction: higher
Guard: npm run typecheck
Baseline: 72.3%
Mode: unbounded
Tasks: 3 created
Permissions: verified (all commands pre-approved)
Starting iteration loop.
The Loop
LOOP (forever, or N times):
1. REVIEW git log --oneline -10 + read results.tsv tail
Know what worked, what failed, what's untried.
2. IDEATE Pick ONE change. Write a one-sentence description
BEFORE touching any code. Consult git history -
don't repeat discarded approaches.
3. MODIFY Make ONE atomic change to in-scope files only.
Small, focused, explainable.
4. COMMIT git add <specific files> (never git add -A)
git commit -m "experiment: <description>"
Commit BEFORE verification. Enables clean rollback.
5. VERIFY Run the verify command. Extract the metric.
If guard is set and metric improved, run guard too.
6. DECIDE
Improved + guard passes (or no guard) -> KEEP
Improved + guard fails -> REVERT (git revert HEAD --no-edit)
Same or worse -> REVERT
Crashed -> attempt fix (max 3 tries), else REVERT
7. LOG Append row to results.tsv
8. REPEAT Go to 1. Print a one-line status every 5 iterations.
NEVER ask "should I continue?" - just keep going.
If bounded and iteration N reached, print summary and stop.
Rollback
Always use git revert HEAD --no-edit (preserves the experiment in history - the agent can learn from it). If revert conflicts, fall back to git reset --hard HEAD~1.
When Stuck (5+ consecutive discards)
- Re-read ALL in-scope files from scratch
- Re-read the original goal
- Review entire results.tsv for patterns
- Try combining two previously successful changes
- Try the opposite of what hasn't been working
- Try something radical - architectural changes, different algorithms
Rules
- One change per iteration. Atomic. If it breaks, you know exactly why.
- Mechanical verification only. No "looks good." The number decides.
- Git is memory. Commit before verify. Revert on failure. Read
git logbefore ideating. Failed experiments stay visible in history via revert commits. - Simpler wins. Equal metric + less code = keep. Tiny improvement + ugly complexity = discard. Removing code for equal results is a win.
- Never stop. Unbounded loops run until interrupted. Never ask permission to continue. The user may be asleep.
- Read before write. Understand full context before each modification.
- Scope is sacred. Only modify files matching the scope globs. Never touch verify/guard targets, test fixtures, or config outside scope.
Results Log
Tab-separated file: results.tsv
iteration commit metric status description
0 a1b2c3d 72.3 baseline initial state
1 b2c3d4e 74.1 keep add edge case tests for auth module
2 - 73.8 discard refactor test helpers (broke coverage)
3 c3d4e5f 75.0 keep add missing null checks in user service
4 - 0.0 crash switched to vitest (import errors)
Status values: baseline, keep, discard, crash
Progress Output
Every 5 iterations, print a brief status:
Iteration 15: metric 81.2 (baseline 72.3, +8.9) | 6 keeps, 8 discards, 1 crash
When a bounded loop completes:
=== Iterate Complete (25/25) ===
Baseline: 72.3 -> Final: 88.7 (+16.4)
Keeps: 12 | Discards: 11 | Crashes: 2
Best iteration: #18 - add integration tests for payment flow (+3.2)
Adapting to Any Domain
The pattern is universal. Change the five inputs, not the loop.
| Domain | Goal | Verify | Direction |
|---|---|---|---|
| Test coverage | Coverage to 90% | npm test -- --coverage |
higher |
| Bundle size | Below 200KB | npm run build && stat -f%z dist/main.js |
lower |
| Performance | Faster API response | npm run bench | grep p95 |
lower |
| ML training | Lower validation loss | uv run train.py && grep val_bpb run.log |
lower |
| Lint errors | Zero warnings | npm run lint 2>&1 | grep -c warning |
lower |
| Lighthouse | Score above 95 | npx lighthouse --output=json | jq .score |
higher |
| Code quality | Reduce complexity | npx complexity-report | grep average |
lower |
Guard: Preventing Regressions
The guard is an optional safety net - a command that must always pass regardless of what the main metric does.
- Verify answers: "Did the metric improve?"
- Guard answers: "Did anything else break?"
If the metric improves but the guard fails, the change is reverted. The agent should note WHY the guard failed and adapt future attempts accordingly.
Common guards: npm test, tsc --noEmit, cargo check, pytest, go vet
Usage Examples
Inline config (all fields provided)
/iterate
Goal: Increase test coverage from 72% to 90%
Scope: src/**/*.ts, src/**/*.test.ts
Verify: npm test -- --coverage | grep "All files" | awk '{print $10}'
Direction: higher
Guard: tsc --noEmit
Iterations: 30
Minimal (triggers interactive setup)
/iterate
Goal: Make the API faster
Agent scans codebase for tooling, suggests scope/verify/direction, asks once, then goes.
Unbounded overnight run
/iterate
Goal: Reduce bundle size below 150KB
Scope: src/**/*.ts, webpack.config.js
Verify: npm run build 2>&1 | grep "main.js" | awk '{print $2}'
Direction: lower
Agent runs indefinitely. User interrupts in the morning. Results are in results.tsv and git history.
More from 0xdarkmatter/claude-mods
file-search
Modern file and content search using fd, ripgrep (rg), and fzf. Triggers on: fd, ripgrep, rg, find files, search code, fzf, fuzzy find, search codebase.
162container-orchestration
Docker and Kubernetes patterns. Triggers on: Dockerfile, docker-compose, kubernetes, k8s, helm, pod, deployment, service, ingress, container, image.
76python-pytest-patterns
pytest testing patterns for Python. Triggers on: pytest, fixture, mark, parametrize, mock, conftest, test coverage, unit test, integration test, pytest.raises.
60python-env
Fast Python environment management with uv (10-100x faster than pip). Triggers on: uv, venv, pip, pyproject, python environment, install package, dependencies.
50data-processing
Process JSON with jq and YAML/TOML with yq. Filter, transform, query structured data efficiently. Triggers on: parse JSON, extract from YAML, query config, Docker Compose, K8s manifests, GitHub Actions workflows, package.json, filter data.
50sqlite-ops
Patterns for SQLite databases in Python projects - state management, caching, and async operations. Triggers on: sqlite, sqlite3, aiosqlite, local database, database schema, migration, wal mode.
48