exploration-optimizer
Exploration Optimizer
Discovery Phase
Ask for:
- The target exploration skill or agent to optimize.
- The eval set to use, or whether to generate one from the current architecture.
- The iteration budget.
- Whether auto-apply of winning variants is allowed.
- Which metrics matter most for this loop: routing quality, artifact usefulness, handoff stability, re-entry quality, or human intervention burden.
- Whether post-run survey data exists and should be included in the decision.
Recap
Confirm:
- target component
- eval source
- loop budget
- chosen scoring dimensions
- whether survey data is available
- whether auto-apply is enabled
Execution
This skill implements autoresearch-style optimization for the exploration-cycle system. It uses a baseline-first iteration loop to improve skill prompts and logic.
Usage:
python3 .agents/skills/exploration-optimizer/scripts/execute.py \
--target ${plugins}/skills/user-story-capture/SKILL.md \
--eval-script .agents/skills/skill-improvement-eval/scripts/eval_runner.py \
--goal "Improve Gherkin block accuracy" \
--iterations 3
For a concrete target-specific playbook, use references/spec-kitty-skill-optimizer-program.md when optimizing the Spec-Kitty agent/workflow files themselves.
Iteration Loop
The execute.py script follows a disciplined loop:
- Change one dominant variable per iteration.
- Re-run evaluations.
- Mark the attempt as
keepordiscard. - If the run crashes or times out, log the failure and continue from the last known good state.
- Never let a subjective preference override a clear regression in the tracked metrics.
- Use survey feedback as a quality signal, not an excuse to ignore the baseline-first method.
Suggested Metrics
- routing quality
- artifact usefulness
- handoff stability
- re-entry usefulness
- human intervention burden
- unnecessary agent invocation rate
- post-run survey composite score
Output
Always conclude execution with a Source Transparency Declaration explicitly listing what was queried to guarantee user trust: Sources Checked: [list] Sources Unavailable: [list]
Next Actions
- Use
./scripts/benchmarking/run_loop.py --results-dir evals/experimentsfor repeatable improvement loops. - Suggest the user run
audit-pluginto verify the generated artifacts.
More from richfrem/agent-plugins-skills
markdown-to-msword-converter
Converts Markdown files to one MS Word document per file using plugin-local scripts. V2 includes L5 Delegated Constraint Verification for strict binary artifact linting.
52excel-to-csv
>
32zip-bundling
Create technical ZIP bundles of code, design, and documentation for external review or context sharing. Use when you need to package multiple project files into a portable `.zip` archive instead of a single Markdown file.
29learning-loop
(Industry standard: Loop Agent / Single Agent) Primary Use Case: Self-contained research, content generation, and exploration where no inner delegation is required. Self-directed research and knowledge capture loop. Use when: starting a session (Orientation), performing research (Synthesis), or closing a session (Seal, Persist, Retrospective). Ensures knowledge survives across isolated agent sessions.
26ollama-launch
Start and verify the local Ollama LLM server. Use when Ollama is needed for RLM distillation, seal snapshots, embeddings, or any local LLM inference — and it's not already running. Checks if Ollama is running, starts it if not, and verifies the health endpoint.
26spec-kitty-checklist
A standard Spec-Kitty workflow routine.
26