long-run

Installation
SKILL.md

Long Run Harness

Orchestrates multi-day execution of complex tasks through a milestone pipeline. Each milestone passes through plan-crafting → run-plan → review-work with checkpoints between milestones for recovery from interruptions.

Core Principle

Long-running execution must be resumable, auditable, and fail-safe. Every state transition is persisted to disk before the next action begins. If execution stops for any reason — rate limit, crash, user pause, context loss — it can resume from the last checkpoint without repeating completed work.

Hard Gates

  1. Milestones must exist before execution. Either from milestone-planning skill or user-provided. Never generate milestones inline during execution.
  2. State file must be updated before and after every milestone. No in-memory-only state. If it's not on disk, it didn't happen.
  3. Each milestone must complete the full pipeline. plan-crafting → run-plan → review-work. No shortcuts. No skipping review-work "because it looked fine."
  4. Failed milestones block dependents. If M2 depends on M1 and M1 fails review, M2 does not start. Period.
  5. User confirmation required at gate points. Before starting a new milestone phase (planning, execution, review), check if the user wants to continue, pause, or abort.
  6. Never modify completed milestones. Once a milestone passes review-work, its files are locked. If a later milestone needs changes to earlier work, that is a new milestone.
  7. Checkpoint after every milestone completion. Write a checkpoint file recording what was done, test results, and review verdict before proceeding.

When To Use

  • After milestone-planning has produced a milestone DAG
  • When the user says "long run", "start long run", "execute milestones", or "run all milestones"
  • When resuming a previously paused long run session

When NOT To Use

  • When milestones don't exist yet (use milestone-planning first)
  • When there's only one milestone (use plan-crafting + run-plan directly)
  • For quick tasks that don't warrant multi-phase execution

Input

  1. Harness state directory path — e.g., docs/engineering-discipline/harness/<session-slug>/
  2. The directory must contain state.md and milestones/*.md files

If no state directory exists, ask the user if they want to run milestone-planning first.

Process

Phase 1: Load and Validate State

  1. Read state.md from the harness directory
  2. Read all milestone files from milestones/
  3. Validate:
    • All milestones referenced in state.md have corresponding files
    • Dependency DAG is valid (no cycles, topological sort possible)
    • No milestone is in an invalid state (e.g., "executing" without a plan file)
  4. Determine current position:
    • Which milestones are completed?
    • Which milestones are ready to start (all dependencies met)?
    • Is this a fresh start or a resume?
  5. Present status to the user:
## Long Run Status: [Session Name]

**Progress:** N/M milestones completed
**Current phase:** [planning M3 | executing M3 | reviewing M3 | ready to start M3]
**Next up:** [M3, M4 (parallel)]

Completed: M1 ✓, M2 ✓
In progress: M3 (executing)
Pending: M4, M5
  1. Ask user to confirm: continue, pause, or abort.

Phase 2: Milestone Execution Loop

For each milestone in topological order:

┌─────────────────────────────────────┐
│         Milestone Pipeline          │
│                                     │
│  ┌──────────┐    ┌─────────┐        │
│  │  Plan    │───→│  Run    │        │
│  │ Crafting │    │  Plan   │        │
│  └──────────┘    └────┬────┘        │
│                       │             │
│                  ┌────▼────┐        │
│                  │ Review  │        │
│                  │  Work   │        │
│                  └────┬────┘        │
│                       │             │
│              ┌────────▼────────┐    │
│              │   PASS?         │    │
│              │  Yes → checkpoint│    │
│              │  No  → retry    │    │
│              └─────────────────┘    │
└─────────────────────────────────────┘

Step 2-1: Gate Check

Before starting a milestone:

  1. Verify all dependency milestones have status completed
  2. Verify no file conflicts with in-progress parallel milestones
  3. Update state.md: set milestone status to planning
  4. Update execution log with timestamp

Step 2-2: Plan Crafting Phase

  1. Compose a Context Brief from the milestone definition:
    • Goal → from milestone file
    • Scope → files affected from milestone file
    • Success Criteria → from milestone file
    • Constraints → inherited from the parent problem + completed milestone context
    • Completed milestone context contract: From each completed predecessor, include ONLY:
      • Files created/modified (from checkpoint's "Files Changed" list)
      • Interface contracts established (function signatures, API shapes, type definitions)
      • Success criteria that were verified as met
    • Do NOT include: execution logs, review documents, worker/validator output, or full checkpoint contents
    • Note: Context Briefs composed from milestone definitions omit the Complexity Assessment section, since routing has already been determined by the milestone-planning phase. The brief goes directly to plan-crafting without re-routing.
  2. Invoke the plan-crafting skill pattern:
    • Create a plan document at docs/engineering-discipline/plans/YYYY-MM-DD-<milestone-name>.md
    • The plan must satisfy all milestone success criteria
    • The plan must not modify files outside the milestone's scope
  3. Update state.md: record plan file path for this milestone
  4. User gate: Present the plan and ask for approval before execution

Step 2-3: Run Plan Phase

  1. Update state.md: set milestone status to executing, increment Attempts counter by 1
  2. Execute the plan using the run-plan skill pattern:
    • Worker-validator loop for each task
    • Parallel execution for independent tasks
    • Information-isolated validators
  3. If run-plan reports failure after 3 retries on any task:
    • Update state.md: set milestone status to failed
    • Record failure details in execution log
    • Stop and report to user. Do not proceed to dependent milestones.
  4. If all tasks complete: proceed to review phase

Step 2-4: Review Work Phase

  1. Update state.md: set milestone status to validating
  2. Invoke the review-work skill pattern:
    • Information-isolated review against the plan document
    • Binary PASS/FAIL verdict
  3. If PASS:
    • Update state.md: set milestone status to completed
    • Write checkpoint file (see Checkpoint Format below)
    • Update execution log
    • Proceed to next milestone
  4. If FAIL:
    • Record review findings in execution log
    • Retry decision (based on Attempts counter in state.md, which persists across crashes):
      • If Attempts == 1: return to Step 2-3 with review feedback (re-execute same plan)
      • If Attempts == 2: return to Step 2-2 (re-plan with review feedback as constraint)
      • If Attempts >= 3: set status to failed, stop, report to user

Step 2-5: Cross-Milestone Integration Check

After a milestone passes review-work but before writing the checkpoint, verify that the milestone's output integrates correctly with all previously completed milestones:

  1. Run the project's highest-level verification (from state.md's Verification Strategy or rediscover using plan-crafting's Verification Discovery order)
  2. Check cross-milestone interfaces: If the completed milestone defines or consumes interfaces from predecessor milestones, verify they are compatible (function signatures match, API contracts hold, types align)

If integration check passes: Proceed to checkpoint.

If integration check fails — Cross-Milestone Failure Response:

The milestone passed its own review-work (internal correctness) but breaks integration with other milestones. This is a boundary problem.

  1. Diagnose (attempt 1):

    • Read the failure output
    • Identify which interface boundary or interaction is broken
    • Determine if the fix belongs to the current milestone or requires a corrective milestone
    • If fixable within current milestone scope: dispatch a targeted fix worker → re-run review-work → re-run integration check
    • If the fix is outside current milestone scope: proceed to escalation
  2. Diagnose (attempt 2):

    • If the first fix didn't resolve it, re-analyze
    • Apply a second targeted fix
    • Re-run integration check
  3. Escalate to user (after 2 failed attempts):

    • Report: which milestones are involved, what integration boundary failed, what fixes were tried
    • Options: add corrective milestone, rollback to checkpoint, accept and continue (user acknowledges the integration gap)
    • Log the user's decision in state.md execution log

Step 2-6: Checkpoint

After a milestone passes review:

Write checkpoints/M<N>-checkpoint.md:

# Checkpoint: M<N> — [Milestone Name]

**Completed:** YYYY-MM-DD HH:MM
**Duration:** [time from planning start to review pass]
**Attempts:** [number of plan-execute-review cycles]

## Plan File
`docs/engineering-discipline/plans/YYYY-MM-DD-<name>.md`

## Review File
`docs/engineering-discipline/reviews/YYYY-MM-DD-<name>-review.md`

## Test Results
[Full test suite status at checkpoint time]

## Files Changed
[List of files created/modified in this milestone]

## State After Milestone
[Brief description of system state — what works now that didn't before]

Phase 3: Parallel Milestone Execution

When multiple milestones have all dependencies satisfied and no file conflicts:

  1. Identify parallelizable milestone group
  2. Run plan-crafting for ALL parallel milestones first (sequentially — plans are lightweight)
  3. Present ALL plans together for batch approval: "Milestones M3 and M4 can run in parallel. Here are both plans. Approve each individually."
  4. User approves or rejects each plan independently. Only approved milestones proceed to execution. Rejected milestones return to Step 2-2 while approved ones execute.
  5. If all approved, dispatch each milestone's pipeline concurrently:
    • Each milestone runs run-plan → review-work (plan already approved in step 3)
    • Each runs in a worktree (isolation: "worktree") to prevent file conflicts
    • After both complete and pass review, merge worktrees back
  6. If either fails: handle independently (the other can continue if no dependency)

Worktree merge protocol:

  1. Both milestones pass review in their respective worktrees
  2. Check for file conflicts between worktree changes
  3. If no conflicts: merge sequentially (M_lower first, then M_higher)
  4. If conflicts detected: stop, report to user, request manual resolution
  5. After merge: run full test suite on merged result
  6. If tests fail: stop, report to user

Phase 4: Completion

After all milestones are completed (including the Integration Verification Milestone from milestone-planning):

  1. Update state.md: set overall status to completing
  2. Final E2E Gate: Run the project's highest-level verification one final time on the fully integrated codebase
  3. Run full test suite for regression check
  4. If Final E2E Gate fails:
    • Diagnose: identify which milestone's output is the likely cause
    • Create a corrective milestone via Mid-Execution Correction procedure
    • Execute corrective milestone through the full pipeline (plan-crafting → run-plan → review-work)
    • Re-run E2E Gate after correction
    • If 2 corrective attempts fail: escalate to user with full diagnosis
  5. If Final E2E Gate passes: Update state.md: set overall status to completed
  6. Generate completion summary:
# Long Run Complete: [Session Name]

**Started:** YYYY-MM-DD
**Completed:** YYYY-MM-DD
**Total milestones:** N
**Total attempts:** [sum of all milestone attempts]

## Milestone Summary

| Milestone | Status | Attempts | Duration |
|-----------|--------|----------|----------|
| M1: [name] | ✓ completed | 1 | 2h |
| M2: [name] | ✓ completed | 2 | 4h |
| ...

## Final Test Suite
[PASS/FAIL — N passed, M failed]

## Files Changed (Total)
[Aggregated list across all milestones]
  1. Present to user and suggest simplify for a final code quality pass

Recovery Protocol

When resuming a paused or interrupted session:

  1. Read state.md to determine last known state
  2. For each milestone, determine recovery action:
Last Status Recovery Action
pending Start normally
planning Restart plan-crafting (plan file may be incomplete)
executing Check run-plan progress; resume or restart
validating Restart review-work (review may be incomplete)
completed Skip (already checkpointed)
failed Present failure to user; ask whether to retry or skip (see Skip Rules below)
skipped Skip (user previously chose to skip this milestone)
  1. For executing milestones: check if tasks in the plan have checkboxes marked. Resume from the first unchecked task.
  2. Read the Attempts counter from state.md to determine retry budget remaining. Do not reset the counter on resume — it persists across crashes to prevent infinite retry loops.
  3. Present recovery plan to user before proceeding.

Mid-Execution Correction

If execution reveals that a completed milestone's output is incorrect or a new milestone is needed:

  1. Pause execution — do not continue with dependent milestones
  2. Log the discovery in state.md execution log: what was found, which milestone triggered the discovery
  3. User decision required: present the situation and options:
    • Add corrective milestone: Create a new milestone definition (the user writes the goal and success criteria, or re-run milestone-planning for just the new scope). Insert it into the DAG with appropriate dependencies. Resume execution from the new milestone.
    • Re-plan from a checkpoint: Roll back to a completed milestone's checkpoint, mark subsequent milestones as pending, reset their Attempts to 0, and restart from that point.
    • Abort: Set overall status to failed and stop.
  4. New milestones follow the same pipeline — plan-crafting → run-plan → review-work. No shortcuts even for "quick fixes."
  5. Completed milestones are never modified (Hard Gate #6 still applies). The corrective milestone produces new files or overwrites with a full plan cycle.

Skip Rules

When a user chooses to skip a failed milestone:

  1. Set milestone status to skipped in state.md
  2. Log the skip event with user's reason in execution log
  3. Dependents of a skipped milestone are also blocked by default — same as failed. The DAG contract is: dependents run only after prerequisites are completed.
  4. The user may explicitly unblock a dependent by acknowledging the missing prerequisite: "Proceed with M4 despite M2 being skipped." Log this override in the execution log.
  5. If the user unblocks a dependent, add a note to that milestone's Context Brief during plan-crafting: "Prerequisite M2 was skipped. The following outputs are missing: [list from M2's success criteria]."

Skipped milestones cannot be un-skipped. If the user wants to attempt the milestone later, create a new milestone with the same goal.

Duration Guard

If a single milestone's total active time (from planning start to review completion) becomes excessive:

  1. Soft limit: If a milestone has been in planning or executing status for more than what appears to be a proportionally large share of the overall work, pause and report to user: "Milestone M3 has been in progress for an extended period. Continue, re-scope, or abort?"
  2. Hard limit on attempts: The 3-attempt limit (F1) bounds retry loops. But if even a single attempt's plan-crafting generates more than 15 tasks, pause and report: "This milestone's plan has N tasks — it may be too large for a single milestone. Consider splitting."
  3. Purpose: Prevent a single runaway milestone from consuming the entire execution budget or running indefinitely on flaky tests.

Context Window Management

Long-running sessions will hit context window limits. Claude Code automatically compresses old messages (context collapse). The harness must be designed to survive this:

  1. Never rely on conversation memory for state. All state lives in state.md and milestone files on disk. If the context is compressed, the harness re-reads state files — no information is lost.
  2. Each milestone is a fresh context boundary. When starting a new milestone's plan-crafting, the worker subagent starts with a clean context. It receives only the milestone definition and completed predecessor context (see F8 contract) — not the full conversation history.
  3. Checkpoint files are the source of truth. If context is lost mid-milestone, recovery reads the checkpoint files, not compressed conversation summaries.
  4. Avoid accumulating large inline state. Do not build up a running summary of all milestones in the conversation. Instead, reference state.md and checkpoint files by path.

Rate Limit Handling

Long-running sessions will encounter rate limits. Claude Code has built-in retry with exponential backoff (up to 10 retries, 5-minute max backoff). The harness should work with this, not against it:

  1. Let claude-code handle transient rate limits. Short 429/529 errors are retried automatically with backoff. Do not preemptively save state on every API error.
  2. Save state on persistent rate limits. If a rate limit persists beyond the automatic retry window (you'll see repeated "rate limit" messages), record current state to disk immediately.
  3. Log the rate limit event in execution log with timestamp.
  4. Report to user: "Rate limit hit. State saved. Resume with long-run when ready."
  5. Do NOT add manual retry loops on top of claude-code's built-in retry — this causes retry amplification.
  6. Background agent bail: Claude Code's background agents (like reviewer subagents) bail immediately on 529 overload errors instead of retrying. This is why Phase 2.5 reviewer failure handling exists — reviewer failures are often transient rate limits, not permanent errors.

Anti-Patterns

Anti-Pattern Why It Fails
Generating milestones inline instead of using milestone-planning Milestones lack adversarial review; poor decomposition
Skipping review-work for "simple" milestones Undetected defects compound across milestones
Continuing after a milestone fails Dependent milestones build on broken foundation
Not updating state.md between phases Crash loses progress; cannot resume
Modifying completed milestone files Breaks checkpoint invariant; invalidates reviews
Running parallel milestones without worktree isolation File conflicts corrupt both milestones
Auto-retrying on rate limit Wastes quota; user may prefer to wait
Skipping user gates between milestones User loses control of multi-day execution
Merging worktrees without conflict check Silent data loss if files overlap
Skipping cross-milestone integration check Milestones pass independently but break each other at boundaries
Retrying E2E failures indefinitely without user escalation 2-attempt limit exists to avoid budget waste on misdiagnosed problems

Minimal Checklist

  • State directory exists with valid state.md and milestone files
  • Dependency DAG validated (no cycles)
  • Current position determined (fresh start or resume)
  • User confirmed continuation at session start
  • Each milestone goes through plan-crafting → run-plan → review-work
  • State.md updated before and after every phase transition
  • Checkpoint written after every successful milestone
  • Failed milestones block dependents
  • Parallel milestones use worktree isolation
  • Cross-milestone integration check passes after each milestone
  • Final E2E Gate passes at completion
  • Full test suite passes at completion

Transition

After long run completion:

  • For final code quality pass → simplify skill
  • If issues found in completion testing → systematic-debugging skill
  • If user wants to extend with more milestones → milestone-planning skill

This skill itself does not invoke the next skill. It reports completion and lets the user decide the next step.

Related skills

More from tmdgusya/engineering-discipline

Installs
29
GitHub Stars
75
First Seen
Apr 1, 2026