self-improve
Self-Improvement Orchestrator
Autonomous loop controller for evolutionary code improvement. Manages the full lifecycle: setup, research, planning, execution, tournament selection, history recording, and stop-condition evaluation.
When to Use
- You want to iteratively improve a codebase toward a measurable benchmark goal
- Optimization tasks: performance, bundle size, test coverage, accuracy
- Code quality improvement with measurable metrics
When NOT to Use
- No measurable benchmark available
- One-shot fix or feature request → use
/omg-autopilot - Manual, interactive coding → use
/ralph
Autonomous Execution Policy
NEVER stop or pause to ask the user during the improvement loop. Once the gate check passes and the loop begins, run fully autonomously until a stop condition is met.
- Do not ask for confirmation between iterations
- On agent failure: retry once, then skip and continue
- On all plans rejected: log it, continue to next iteration
- The only things that stop the loop are the stop conditions in Step 11
State Tracking
All state lives under .omc/self-improve/:
.omc/self-improve/
├── config/
│ ├── settings.json # agents, benchmark, thresholds, sealed_files
│ ├── goal.md # Improvement objective + target metric
│ ├── harness.md # Guardrail rules (H001/H002/H003)
│ └── idea.md # User experiment ideas
├── state/
│ ├── agent-settings.json # iterations, best_score, status, counters
│ ├── iteration_state.json # Within-iteration progress (resumability)
│ ├── research_briefs/ # Research output per round
│ ├── iteration_history/ # Full history per round
│ ├── merge_reports/ # Tournament results
│ └── plan_archive/ # Archived plans (permanent)
├── plans/ # Active plans (current round)
└── tracking/
├── raw_data.json # All candidate scores
├── baseline.json # Initial benchmark score
└── events.json # Config changes
Agent Mapping
| Step | Role | Agent | Purpose |
|---|---|---|---|
| Research | Codebase analysis | @explore + @architect | Hypothesis generation |
| Planning | Hypothesis → plan | @planner | Structured plan per agent |
| Architecture Review | 6-point review | @architect | Advisory review |
| Critic Review | Harness enforcement | @critic | Approve/reject plans |
| Execution | Implement + benchmark | @executor | Implement plan faithfully |
| Git Operations | Merge/tag/PR | @git-master | Atomic merge operations |
Setup Phase
- Check if target repo path exists. If not configured, ask user.
- Create
.omc/self-improve/directory structure. - Read
agent-settings.json. Check setup flags. - Trust confirmation (mandatory):
- Display target repo path, ask user to confirm benchmark execution.
- If declined: abort.
- Record consent:
trust_confirmed: true
- If goal not set → Run Socratic interview (Objective, Metric, Target, Scope). Write to
goal.md. - If benchmark not set → Survey repo, create/wrap benchmark, validate 3x, record baseline.
- If harness not set → Confirm default harness rules (H001/H002/H003).
- Gate: All settings + trust must be true.
- Create improvement branch:
improve/{goal_slug}from target branch. - Write initial state via
omg_write_state.
Improvement Loop
Gate: All settings must be true. Execute continuously without stopping.
Step 0 — Stale Worktree Cleanup
Remove orphaned worktrees from prior iterations.
Step 1 — Refresh State
Update state to reset TTL.
Step 2 — Check Stop Request
If state is cleared or status is user_stopped: exit gracefully.
Step 3 — Check User Ideas
Read idea.md. If non-empty, pass to planners.
Step 4 — Research
Spawn @explore + @architect to analyze codebase and generate hypotheses based on goal, history, and prior briefs.
Step 5 — Plan
Spawn N @planner agents in parallel (N = number_of_agents). Each produces a plan with one testable hypothesis, approach_family tag, and history_reference.
Step 6 — Review
For each plan:
- 6a. Architecture Review: @architect with 6-point checklist (testability, novelty, scope, target files, implementation clarity, expected outcome). Advisory only.
- 6b. Critic Review: @critic with harness rules (H001: one hypothesis, H002: no approach_family streak ≥3, H003: intra-round diversity). Sets
critic_approved: true/false.
Step 7 — Execute
For each approved plan, spawn @executor in parallel. Each executor works in a git worktree, implements the plan, runs validation, and benchmarks.
Step 8 — Tournament Selection
- Collect results, filter to
status: "success" - Rank by
benchmark_score(respecting direction) - For each candidate (best first):
- No-regression check vs
best_score - Merge via @git-master with
--no-ff - Re-benchmark on merged state
- If confirmed: accept winner, break
- If regression: revert merge, try next
- If conflict: abort merge, try next
- No-regression check vs
- Archive non-winner branches
Step 9 — Record & Visualize
Write iteration history, update agent-settings (scores, plateau count, circuit breaker), append tracking data.
Step 10 — Cleanup
Remove worktrees, update iteration state to completed.
Step 11 — Stop Condition Check
| Condition | Check |
|---|---|
| User stop | status == "user_stopped" |
| Target reached | best_score meets/exceeds target_value |
| Plateau | plateau_consecutive_count >= plateau_window |
| Max iterations | iterations >= max_iterations |
| Circuit breaker | circuit_breaker_count >= circuit_breaker_threshold |
If NO stop condition: immediately go back to Step 1.
Resumability
On invocation:
- Always run Step 0 (stale worktree cleanup)
- Check
agent-settings.json:user_stopped: ask to resumerunning: crashed — resume automaticallyidle: fresh start
- Check
iteration_state.json: resume from last step if in-progress
Completion
- Update final status
- Print summary (status, iterations, best score, baseline, improvement %)
- Run
/cancelfor clean state cleanup
Approach Family Taxonomy
Every plan must be tagged with exactly one:
| Tag | Description |
|---|---|
architecture |
Model/component structure changes |
training_config |
Optimizer, LR, scheduler, batch size |
data |
Data loading, augmentation, preprocessing |
infrastructure |
Mixed precision, distributed training |
optimization |
Algorithmic/numerical optimizations |
testing |
Evaluation methodology changes |
documentation |
Documentation-only changes |
other |
Does not fit above |