Terminal-Bench Loop

A repeatable operating skill for driving one Terminal-Bench problem to a passing smoke through Paperclip, with explicit issue topology, bounded runs, board-gated product fixes, and worktree continuity.

This skill is operational + diagnostic, not engineering. It coordinates issues, artifacts, and approvals around a Terminal-Bench loop. It does not authorize code changes — every accepted product fix lands as a separate implementation child issue after a board confirmation.

Canonical execution model: read doc/execution-semantics.md before starting a loop or moving any loop issue. Every loop issue must rest in a state the doc allows: terminal (done/cancelled), explicitly live (active run / queued wake), explicitly waiting (in_review with participant/interaction/approval), or explicit recovery/blocker (blocked with blockedByIssueIds and a named owner).

When to use

Trigger on an assignment whose title or body matches any of:

"run Terminal-Bench in a loop", "loop <task-name> through Paperclip"
"drive Terminal-Bench fix-git", "iterate on Terminal-Bench until it passes"
"Terminal-Bench smoke loop", "bench loop", "smoke loop on <task-name>"
An attached link to a Terminal-Bench loop parent issue, plus a request to do another iteration

Also use when the user hands you an existing top-level loop issue and asks for the next iteration, diagnosis, or rerun.

terminal-bench-loop

Terminal-Bench Loop

When to use