perf-benchmarker
perf-benchmarker
Run sequential benchmarks with strict duration rules.
Follow docs/perf-requirements.md as the canonical contract.
Parse Arguments
const args = '$ARGUMENTS'.split(' ').filter(Boolean);
const command = args.find(a => !a.match(/^\d+$/)) || '';
const duration = parseInt(args.find(a => a.match(/^\d+$/)) || '60', 10);
Required Rules
- Benchmarks MUST run sequentially (never parallel).
- Minimum duration: 60s per run (30s only for binary search).
- Warmup: 10s minimum before measurement.
- Re-run anomalies.
Output Format
command: <benchmark command>
duration: <seconds>
warmup: <seconds>
results: <metrics summary>
notes: <anomalies or reruns>
Output Contract
Benchmarks MUST emit a JSON metrics block between markers:
PERF_METRICS_START
{"scenarios":{"low":{"latency_ms":120},"high":{"latency_ms":450}}}
PERF_METRICS_END
Constraints
- No short runs unless binary-search phase.
- Do not change code while benchmarking.
More from agent-sh/agentsys
debate
Structured AI debate templates and synthesis. Use when orchestrating multi-round debates between AI tools, 'debate topic', 'argue about', 'stress test idea', 'devil advocate'.
10discover-tasks
Use when user asks to \"discover tasks\", \"find next task\", \"prioritize issues\", \"what should I work on\", or \"list open issues\". Discovers and ranks tasks from GitHub, GitLab, local files, and custom sources.
9learn
Research any topic online and create learning guides. Use when user asks to 'learn about', 'research topic', 'create learning guide', 'build knowledge base', or 'study subject'.
9web-browse
Browse and interact with web pages headlessly. Use when agent needs to navigate websites, click elements, fill forms, read content, or take screenshots.
9deslop
Use when user wants to clean AI slop from code. Use for cleanup, remove debug statements, find ghost code, repo hygiene.
8perf-baseline-manager
Use when managing perf baselines, consolidating results, or comparing versions. Ensures one baseline JSON per version.
8