Evolutionary Archive Management
Evolutionary Archive Management
The evolutionary archive is the persistent memory of the HyperAgents improvement process. It stores every generation's code changes, fitness scores, and lineage relationships.
Archive Structure
.hyperagents/
├── archive.jsonl # Append-only generation log
├── config.json # Evolution configuration
├── next_parent.json # Pre-computed next parent selection
├── gen_initial/ # Baseline generation
│ ├── metadata.json # Generation metadata
│ └── <domain>_eval/ # Evaluation results
│ ├── report.json # Aggregate scores
│ └── predictions.csv # Per-item predictions
├── gen_0/
│ ├── metadata.json
│ ├── agent_output/
│ │ ├── model_patch.diff # The code diff
│ │ └── meta_agent_chat_history.md
│ └── <domain>_eval/
│ ├── report.json
│ └── predictions.csv
├── gen_1/
│ └── ...
└── gen_N/
└── ...
archive.jsonl Format
Each line is a self-contained JSON snapshot of the archive state after adding a new generation:
{"current_genid": 3, "archive": ["initial", 0, 1, 2, 3]}
This append-only format enables:
- Recovery from crashes (last complete line is the truth)
- Historical analysis (see how the archive grew)
- Atomic updates (each line is a complete snapshot)
metadata.json Format
Each generation's metadata records its full context:
{
"gen_output_dir": ".hyperagents/gen_3",
"current_genid": 3,
"parent_genid": 1,
"prev_patch_files": [".hyperagents/gen_1/agent_output/model_patch.diff"],
"curr_patch_files": [".hyperagents/gen_3/agent_output/model_patch.diff"],
"parent_agent_success": true,
"optimize_option": "only_agent",
"can_select_next_parent": true,
"run_eval": true,
"run_full_eval": true,
"valid_parent": true
}
Key Operations
Adding a Generation
- Create
gen_<id>/directory - Run meta-agent, save diff and chat history to
agent_output/ - Run evaluation, save results to
<domain>_eval/ - Write
metadata.jsonwith all context - Append to
archive.jsonlwith updated archive list
Reconstructing a Generation
To get the full codebase state at any generation:
- Start from the root commit
- Apply all diffs in the lineage chain: initial -> parent -> ... -> target
- The lineage is stored in
metadata.jsonasprev_patch_files+curr_patch_files
Querying the Archive
- Best generation: Sort by fitness, take max
- Lineage of N: Follow
parent_genidlinks from N to initial - Valid parents: Filter where
valid_parent: true - Improvement rate: Compare each generation to its parent
Fitness Score Retrieval
Scores live in gen_<id>/<domain>_eval/report.json. The score key varies by domain:
overall_accuracy— classification domainsaverage_progress— game environmentsaverage_fitness— control tasksaccuracy_score— code editingpoints_percentage— proof grading
Ensemble Scoring
When ensemble optimization is enabled, additional scores are stored at:
gen_<id>/report_ensemble_<domain>_<split>.json
The max score type returns the better of agent and ensemble scores.
Examples
These scenarios illustrate when this skill activates and what it does.
Scenario 1: Inspecting the lineage of the best generation
Trigger: User asks "Show me how generation 7 evolved -- what was its lineage?"
Action: The skill reads gen_7/metadata.json to find parent_genid, then follows the chain of parent_genid links back to gen_initial. For each ancestor, it reads the report.json to retrieve fitness scores and the model_patch.diff to summarize what changed. It presents the full lineage as a chain (e.g., initial -> gen_1 -> gen_4 -> gen_7) with fitness scores at each step.
Scenario 2: Recovering from a corrupted archive
Trigger: User reports "The evolve command crashed mid-run and now /hyperagents:status shows an error."
Action: The skill reads archive.jsonl and identifies the last complete JSON line as the reliable state. If the final line is truncated, it trims the file to the last valid line. It then cross-checks that every genid referenced in the archive has a corresponding gen_<id>/metadata.json directory, flagging any orphaned or missing entries.
Scenario 3: Querying the archive for improvement patterns
Trigger: User asks "Which types of changes led to the biggest fitness improvements?"
Action: The skill iterates through all generations in the archive, comparing each generation's fitness to its parent's fitness. For each positive delta, it reads the corresponding model_patch.diff and categorizes the change (prompt edit, logic refactor, tool change, etc.). It summarizes the results as a ranked list of change types by average fitness improvement.
Safety Rules
- Never modify archive.jsonl in place — only append
- Never delete gen_initial/ — it's the baseline
- Always write metadata.json atomically — write to temp file, then rename
- Back up before pruning — copy archive.jsonl before removing generations
- Validate lineage integrity — every genid's parent must exist in the archive
More from zpankz/hyperagents
staged evaluation
Two-phase evaluation strategy from HyperAgents — run a quick staged check on small samples first, only proceed to full evaluation if the staged eval passes. Saves 90%+ compute on broken mutations. Triggers when evaluating generations, running benchmarks, or optimizing evaluation cost.
1fitness evaluation framework
Domain-agnostic fitness evaluation for evolved code generations. Defines evaluation harness interfaces, scoring contracts, and multi-domain aggregation. Triggers when evaluating code quality, running benchmarks, or scoring agent outputs.
1parent selection strategies
Evolutionary parent selection algorithms for choosing which generation to mutate next. Implements random, best, score-proportional, and novelty-aware selection. Triggers when selecting parents, managing exploration/exploitation tradeoffs, or configuring evolution strategy.
1domain evaluation harness
Create and configure domain-specific evaluation harnesses for the HyperAgents evolution loop. Defines how tasks are loaded, agents are invoked, predictions are collected, and scores are computed. Triggers when setting up evaluation domains or creating custom fitness functions.
1self-referential self-improvement
Apply HyperAgents' self-referential improvement pattern to any code artifact. Triggers when Claude is asked to 'improve', 'optimize', 'evolve', or 'self-improve' code, agents, skills, or prompts. Also triggers on repeated failures as an automatic recovery strategy.
1