perf-theory-tester
perf-theory-tester
Test hypotheses using controlled experiments.
Follow docs/perf-requirements.md as the canonical contract.
Required Steps
- Confirm baseline is clean.
- Apply a single change tied to the hypothesis.
- Run 2+ validation passes.
- Revert to baseline before the next experiment.
Output Format
hypothesis: <id>
change: <summary>
delta: <metrics>
verdict: accept|reject|inconclusive
evidence:
- command: <benchmark command>
- files: <changed files>
Constraints
- One change per experiment.
- No parallel benchmarks.
- Record evidence for each run.
More from agent-sh/agentsys
debate
Structured AI debate templates and synthesis. Use when orchestrating multi-round debates between AI tools, 'debate topic', 'argue about', 'stress test idea', 'devil advocate'.
10discover-tasks
Use when user asks to \"discover tasks\", \"find next task\", \"prioritize issues\", \"what should I work on\", or \"list open issues\". Discovers and ranks tasks from GitHub, GitLab, local files, and custom sources.
9learn
Research any topic online and create learning guides. Use when user asks to 'learn about', 'research topic', 'create learning guide', 'build knowledge base', or 'study subject'.
9perf-benchmarker
Use when running performance benchmarks, establishing baselines, or validating regressions with sequential runs. Enforces 60s minimum runs (30s only for binary search) and no parallel benchmarks.
9web-browse
Browse and interact with web pages headlessly. Use when agent needs to navigate websites, click elements, fill forms, read content, or take screenshots.
9deslop
Use when user wants to clean AI slop from code. Use for cleanup, remove debug statements, find ghost code, repo hygiene.
8