terminal-bench-loop
Terminal-Bench Loop
A repeatable operating skill for driving one Terminal-Bench problem to a passing smoke through Paperclip, with explicit issue topology, bounded runs, board-gated product fixes, and worktree continuity.
This skill is operational + diagnostic, not engineering. It coordinates issues, artifacts, and approvals around a Terminal-Bench loop. It does not authorize code changes — every accepted product fix lands as a separate implementation child issue after a board confirmation.
Canonical execution model: read doc/execution-semantics.md before starting a loop or moving any loop issue. Every loop issue must rest in a state the doc allows: terminal (done/cancelled), explicitly live (active run / queued wake), explicitly waiting (in_review with participant/interaction/approval), or explicit recovery/blocker (blocked with blockedByIssueIds and a named owner).
When to use
Trigger on an assignment whose title or body matches any of:
- "run Terminal-Bench in a loop", "loop <task-name> through Paperclip"
- "drive Terminal-Bench fix-git", "iterate on Terminal-Bench until it passes"
- "Terminal-Bench smoke loop", "bench loop", "smoke loop on <task-name>"
- An attached link to a Terminal-Bench loop parent issue, plus a request to do another iteration
Also use when the user hands you an existing top-level loop issue and asks for the next iteration, diagnosis, or rerun.