skills/davebcn87/pi-autoresearch/autoresearch-create

autoresearch-create

SKILL.md

Autoresearch

Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.

Tools

  • init_experiment — configure session (name, metric, unit, direction). Call again to re-initialize with a new baseline when the optimization target changes.
  • run_experiment — runs command, times it, captures output.
  • log_experiment — records result. keep auto-commits. discard/crash/checks_failedgit checkout -- . to revert. Always include secondary metrics dict. Dashboard: ctrl+x.

Setup

  1. Ask (or infer): Goal, Command, Metric (+ direction), Files in scope, Constraints.
  2. git checkout -b autoresearch/<goal>-<date>
  3. Read the source files. Understand the workload deeply before writing anything.
  4. Write autoresearch.md and autoresearch.sh (see below). Commit both.
  5. init_experiment → run baseline → log_experiment → start looping immediately.

autoresearch.md

This is the heart of the session. A fresh agent with no context should be able to read this file and run the loop effectively. Invest time making it excellent.

# Autoresearch: <goal>

## Objective
<Specific description of what we're optimizing and the workload.>

## Metrics
- **Primary**: <name> (<unit>, lower/higher is better)
- **Secondary**: <name>, <name>, ...

## How to Run
`./autoresearch.sh` — outputs `METRIC name=number` lines.

## Files in Scope
<Every file the agent may modify, with a brief note on what it does.>

## Off Limits
<What must NOT be touched.>

## Constraints
<Hard rules: tests must pass, no new deps, etc.>

## What's Been Tried
<Update this section as experiments accumulate. Note key wins, dead ends,
and architectural insights so the agent doesn't repeat failed approaches.>

Update autoresearch.md periodically — especially the "What's Been Tried" section — so resuming agents have full context.

autoresearch.sh

Bash script (set -euo pipefail) that: pre-checks fast (syntax errors in <1s), runs the benchmark, outputs METRIC name=number lines. Keep it fast — every second is multiplied by hundreds of runs. Update it during the loop as needed.

autoresearch.checks.sh (optional)

Bash script (set -euo pipefail) for backpressure/correctness checks: tests, types, lint, etc. Only create this file when the user's constraints require correctness validation (e.g., "tests must pass", "types must check").

When this file exists:

  • Runs automatically after every passing benchmark in run_experiment.
  • If checks fail, run_experiment reports it clearly — log as checks_failed.
  • Its execution time does NOT affect the primary metric.
  • You cannot keep a result when checks have failed.
  • Has a separate timeout (default 300s, configurable via checks_timeout_seconds).

When this file does not exist, everything behaves exactly as before — no changes to the loop.

Keep output minimal. Only the last 80 lines of checks output are fed back to the agent on failure. Suppress verbose progress/success output and let only errors through. This keeps context lean and helps the agent pinpoint what broke.

#!/bin/bash
set -euo pipefail
# Example: run tests and typecheck — suppress success output, only show errors
pnpm test --run --reporter=dot 2>&1 | tail -50
pnpm typecheck 2>&1 | grep -i error || true

Loop Rules

LOOP FOREVER. Never ask "should I continue?" — the user expects autonomous work.

  • Primary metric is king. Improved → keep. Worse/equal → discard. Secondary metrics rarely affect this.
  • Simpler is better. Removing code for equal perf = keep. Ugly complexity for tiny gain = probably discard.
  • Don't thrash. Repeatedly reverting the same idea? Try something structurally different.
  • Crashes: fix if trivial, otherwise log and move on. Don't over-invest.
  • Think longer when stuck. Re-read source files, study the profiling data, reason about what the CPU is actually doing. The best ideas come from deep understanding, not from trying random variations.
  • Resuming: if autoresearch.md exists, read it + git log, continue looping.

NEVER STOP. The user may be away for hours. Keep going until interrupted.

Ideas Backlog

When you discover complex but promising optimizations that you won't pursue right now, append them as bullets to autoresearch.ideas.md. Don't let good ideas get lost.

On resume (context limit, crash), check autoresearch.ideas.md — prune stale/tried entries, experiment with the rest. When all paths are exhausted, delete the file and write a final summary.

User Messages During Experiments

If the user sends a message while an experiment is running, finish the current run_experiment + log_experiment cycle first, then incorporate their feedback in the next iteration. Don't abandon a running experiment.

Weekly Installs
21
GitHub Stars
1.8K
First Seen
4 days ago
Installed on
opencode21
github-copilot21
codex21
amp21
cline21
kimi-cli21