skills/henryqw/skills/gh-autopilot

gh-autopilot

SKILL.md

GH Autopilot

Use this skill to operate a deterministic Copilot review loop on one PR. The user must explicitly choose the starting stage. The skill must begin there and keep looping until a terminal condition is reached.

This skill is stateful and persists artifacts under .context/gh-autopilot/.

Required Start Stage

Require the user to provide one of the following stage values:

  • 1 (create_pr): PR not created yet. Create/select PR first.
  • 2 (monitor_review): PR exists and Copilot is reviewing (or expected soon).
  • 3 (address_comments): Copilot comments already exist and must be addressed now.

If start stage is missing, ask once and wait. Do not guess.

Terminal end conditions for the loop:

  • completed_no_comments (success)
  • timeout with reason stage2_max_wait_reached (Stage 2 overall wait budget exhausted)

Timing Contract

  • Initial wait: 300 seconds.
  • Poll interval after initial wait: 45 seconds.
  • Keep polling after 10 minutes.
  • Stop each cycle wait at 40 minutes (2400 seconds) and mark cycle timeout.
  • Stage 2 overall max wait defaults to 12 hours (43200 seconds) unless overridden.
  • On cycle timeout, immediately retry Stage 2 wait. Do not stop manually while still inside the Stage 2 max-wait budget.
  • Stop entire loop when Copilot summary says generated no comments.

Autopilot Persistence Contract

Autopilot is a persistent control loop. Once started, it must keep operating with timed polling and deterministic transitions until one of the following happens:

  • terminal success: completed_no_comments and drain guard passes
  • terminal timeout: timeout with reason stage2_max_wait_reached
  • explicit blocker: auth failure, state corruption, or PR mismatch that cannot be auto-recovered

Idle waiting is not a stop condition. A single cycle timeout is not a stop condition.

GH Command Reference Contract

When this skill uses gh commands, treat gh-cli as the command source of truth for command shape and flags.

  • Validate auth flows with gh-cli guidance (gh auth status, gh auth login).
  • Validate PR resolution/edit patterns with gh-cli guidance (gh pr view, gh pr edit --add-reviewer/--remove-reviewer, --json, --jq).
  • Validate GraphQL/API invocation patterns with gh-cli guidance (gh api graphql).

Primary Engine

Use scripts/run_autopilot_loop.py as the control-plane entrypoint.

Commands

  • init: initialize state for one PR.
  • run-cycle: wait for new Copilot review and export cycle artifacts.
  • run-stage2-loop: run Stage 2 with automatic cycle-timeout retries until action/terminal/Stage 2 max-wait limit.
  • finalize-cycle: mark current cycle addressed and re-request Copilot.
  • status: print current state.
  • assert-drained: fail if any address-required cycle is still pending.
  • simulate-fsm: deterministic dry-run of event-driven status transitions.

The engine is event-driven: state transitions are applied from explicit events (for example cycle_timeout, cycle_needs_address, finalize_with_reviewer_request) instead of ad-hoc status rewrites. Every JSON command result includes exactly one canonical resume_command to continue from the current state.

State File

Default: .context/gh-autopilot/state.json

Important fields:

  • status
  • cycle
  • last_processed_review_id
  • pending_review_id
  • pr

Event Log

Default: .context/gh-autopilot/events.jsonl

Each line is a normalized JSON event:

  • schema_version: event schema version.
  • timestamp: event time in UTC ISO8601.
  • event_type: normalized snake_case event name.
  • payload: event payload object.

Context Workspace Files

Use .context/gh-autopilot/ as the durable workspace for autonomy and recovery.

  • context.md: single source of truth for next actions, status snapshot, artifacts, and suggested commands. Includes a compact status header: phase=<...> | cycle=<...> | status=<...> | timeout_reason=<...>.

Keep this intentionally simple: one context file, not multiple overlapping notes.

Stage Router

Start from the user-selected stage: create_pr, monitor_review, or address_comments.

Routing rules:

  1. Stage 1 (create_pr) always transitions to Stage 2.
  2. Stage 2 (monitor_review) runs the persistent supervisor path (run-stage2-loop).
  3. Stage 2 guard order is strict:
    • check pending address/triage first
    • check terminal no-comments + drain guard
    • check per-cycle-timeout retry state
    • check Stage 2 max-wait timeout
    • otherwise run another poll cycle
  4. If Stage 2 returns awaiting_address or awaiting_triage, transition to Stage 3.
  5. Stage 3 (address_comments) finalizes the full batch with finalize-cycle, then transitions back to Stage 2.

Event-driven state transitions:

initialized|rerequested --begin_cycle_wait--> waiting_for_review
waiting_for_review --cycle_timeout--> timeout (cycle_max_wait_reached)
timeout (cycle_max_wait_reached) --stage2_retry_after_cycle_timeout--> initialized
waiting_for_review --cycle_no_comments--> completed_no_comments
waiting_for_review --cycle_needs_address--> awaiting_address
waiting_for_review --cycle_needs_triage--> awaiting_triage
awaiting_address --finalize_with_reviewer_request--> rerequested
awaiting_triage --finalize_with_reviewer_request--> rerequested
awaiting_address|awaiting_triage --finalize_without_reviewer_request--> initialized
initialized|waiting_for_review|rerequested --stage2_max_wait_reached--> timeout

Stage Details

Stage 1 (create_pr)

User intent: PR has not been created yet.

Actions:

  1. Use gh-cli as reference for all gh command usage in this stage.
  2. Run gh auth status.
  3. Resolve current-branch PR with gh pr view (omit --pr).
  4. If an open PR already exists for the branch, skip PR creation and move to Stage 2.
  5. If no PR exists, run gh-pr-creation to open one.
  6. Initialize state with init (avoid --force unless state is intentionally reset).
  7. Move to Stage 2.

Stage 2 (monitor_review)

User intent: PR exists and we are waiting for Copilot output.

Actions:

  1. Use gh-cli as reference for all gh command usage in this stage.
  2. Run gh auth status.
  3. Resolve PR (current branch or explicit --pr).
  4. Ensure state exists for the PR:
    • If missing: run init.
    • If state already awaiting_address or awaiting_triage: move directly to Stage 3.
  5. Run run-stage2-loop with normal timing (300/45/2400) plus Stage 2 max wait (43200 by default).
    • On Stage 2 entry, run-stage2-loop performs an immediate fetch pass (initial_sleep=0) to capture already-finished Copilot reviews/comments.
    • run-stage2-loop retries run-cycle automatically after each cycle timeout.
    • run-cycle exports comments by matching each thread comment to the active Copilot review_id (not by timestamp cutoff).
  6. Interpret result:
    • completed_no_comments -> terminal success; stop loop.
      • includes cycles where no Copilot thread comments were captured for that review round
    • timeout with reason stage2_max_wait_reached -> terminal timeout; stop loop.
    • awaiting_address or awaiting_triage -> move to Stage 3.
  7. Before any terminal stop/report in Stage 2, run assert-drained.
    • If it exits non-zero, do not stop; continue to Stage 3.

If Copilot is already reviewing when Stage 2 starts, do not re-request reviewer; continue waiting with run-stage2-loop.

Never stop Stage 2 manually while the command is still within the configured Stage 2 max-wait limit.

Stage 3 (address_comments)

User intent: comments already exist and must be processed now.

Actions:

  1. Ensure fresh cycle artifacts are available:
    • If state is already awaiting_address or awaiting_triage, use existing cycle.json.
    • Otherwise run run-cycle with --initial-sleep-seconds 0 to capture existing comments immediately.
    • If parsed_summary.generated_comments > 0 but counts.copilot_comments_total == 0, artifacts are inconsistent: re-run immediate run-cycle and do not finalize until comments are captured.
  2. Build normalized worker artifacts in shared context:
    • run build_review_batch.py to create review-batch.json
  3. Use gh-cli as reference for any gh commands used to resolve/reply on PR threads.
  4. Run Stage 3 worker actions inside this skill:
    • process all threads from review-batch.json
    • account for every Copilot comment in those threads
    • resolve each actionable thread in GitHub
    • reply on each non-actionable thread with rationale
    • do not leave any thread/comment unreviewed or unaddressed
    • push exactly once for the batch
    • do not request Copilot review while processing individual threads
    • update cycle.json.addressing with complete per-thread and per-comment coverage
  5. Validate cycle.json.addressing before finalizing:
    • status=ready_for_finalize
    • pushed_once=true
    • review_id and cycle match active state
    • threads.addressed + threads.rejected_with_rationale equals total thread count
    • threads.needs_clarification=0
    • thread_responses has exactly one entry per thread
    • for each thread_responses entry:
      • classification=actionable requires resolved=true
      • classification=non-actionable requires rationale_replied=true
    • comments.addressed_or_rationalized equals total comment count
    • comments.needs_clarification=0
    • comment_statuses has exactly one entry per comment with:
      • status in {action, no_action}
      • cycle equal to the active cycle
      • chronological sort by created_at
  6. Run finalize-cycle only when validation passes (re-requests Copilot unless explicitly skipped for recovery).
    • run this once per cycle, after the full thread batch is complete
    • never run it immediately after addressing a single thread
    • never call reviewer add/remove directly during Stage 3; finalize-cycle is the only allowed reviewer request path
  7. Return to Stage 2.

Command Templates

Initialize State

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  init \
  --initial-sleep-seconds 300 \
  --poll-interval-seconds 45 \
  --cycle-max-wait-seconds 2400

Use --force with init only when intentionally resetting prior state. If reusing an existing PR branch, do not run --force unless the current state is stale or corrupted.

Monitor Stage 2 Loop (recommended)

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  run-stage2-loop \
  --initial-sleep-seconds 300 \
  --poll-interval-seconds 45 \
  --cycle-max-wait-seconds 2400 \
  --stage2-max-wait-seconds 43200

Use this command for normal Stage 2 operation. It performs an immediate bootstrap fetch first, then automatically retries cycle waits when a cycle-level timeout occurs.

Monitor One Cycle (diagnostic/manual)

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  run-cycle \
  --initial-sleep-seconds 300 \
  --poll-interval-seconds 45 \
  --cycle-max-wait-seconds 2400

Capture Existing Comments Immediately (Stage 3 bootstrap)

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  run-cycle \
  --initial-sleep-seconds 0 \
  --poll-interval-seconds 45 \
  --cycle-max-wait-seconds 2400

Build Stage 3 Worker Batch Artifacts

python "<path-to-skill>/scripts/build_review_batch.py" \
  --cycle ".context/gh-autopilot/cycle.json" \
  --output-dir ".context/gh-autopilot"

Finalize Addressed Cycle

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  finalize-cycle

This command validates cycle.json.addressing coverage first, then:

  1. Moves pending_review_id into last_processed_review_id.
  2. Increments cycle.
  3. Re-requests Copilot via remove/add reviewer sequence.
  4. Records per-comment status (action/no_action) in finalize event payloads.

Use --skip-reviewer-request only for manual recovery paths.

Print Current State

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  status

Assert No Pending Address-Required Cycle

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  assert-drained

Use this as the final gate before reporting completion/timeout handling results. If state is awaiting_address or awaiting_triage, this command fails and the loop must continue through Stage 3.

Simulate FSM Transitions (deterministic)

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  simulate-fsm \
  --start-status initialized \
  --event begin_cycle_wait \
  --event cycle_needs_address \
  --event finalize_with_reviewer_request

Artifacts and Exit Codes

Outputs (default .context/gh-autopilot/):

  • cycle.json
  • review-batch.json (generated by Stage 3 worker setup)
  • context.md (updated)

Status meanings:

  • completed_no_comments: terminal success
  • timeout: timeout status
    • from run-cycle: timeout for that single cycle wait
    • from run-stage2-loop: Stage 2 overall timeout (reason=stage2_max_wait_reached)
  • awaiting_address: actionable Copilot comments captured
  • awaiting_triage: review exists but needs manual interpretation

Exit codes:

  • 0: terminal success or already-terminal state
  • 3: run-cycle timeout
  • 10: comments/triage action required
  • 11: assert-drained detected unaddressed pending cycle
  • 12: run-stage2-loop exhausted Stage 2 max-wait budget

Loop Contract

After entering via the user-selected stage, keep routing until terminal. Do not stop after a single cycle unless blocked by auth/state errors.

current_stage = user_selected_stage
while true:
  if current_stage == 1:
    run Stage 1
    current_stage = 2
    continue

  if current_stage == 2:
    run persistent Stage 2 supervisor
    if status in {awaiting_address, awaiting_triage}:
      current_stage = 3
      continue
    if status == completed_no_comments:
      if assert-drained != 0:
        current_stage = 3
        continue
      stop
    if status == timeout and reason == stage2_max_wait_reached:
      if assert-drained != 0:
        current_stage = 3
        continue
      stop
    # cycle timeout is internal retry; never stop here
    continue

  if current_stage == 3:
    run Stage 3 worker handoff
    if cycle.addressing.status != ready_for_finalize:
      stop and request clarification
    if cycle.addressing does not cover all review comments:
      stop and request clarification
    current_stage = 2
    continue

Recovery Scenarios

Handle common failure modes explicitly:

  1. Auth failure:
    • Run gh auth status.
    • If unauthenticated, run gh auth login and retry.
  2. State/PR mismatch:
    • If state PR differs from intended PR, re-run init with correct --pr.
    • Use --force only when intentionally discarding prior loop state.
  3. Closed/merged PR mid-loop:
    • Stop loop.
    • Open or select a new active PR.
    • Re-initialize state for that PR.
  4. Existing open PR before start:
    • Skip gh-pr-creation.
    • Initialize directly against that PR.
  5. Copilot already reviewing when loop starts (Stage 2):
    • Skip re-request.
    • Run run-stage2-loop and allow repeated cycle wait windows to continue.
    • Continue normal addressing flow when cycle comments arrive.
  6. Copilot comments already present when loop starts (Stage 3):
    • Run immediate capture (--initial-sleep-seconds 0) only if cycle artifacts are missing/stale.
    • Build review-batch.json in .context/gh-autopilot/.
    • Address comments directly in Stage 3 of this skill.
    • Finalize only when cycle.json.addressing reports ready and full comment coverage.
    • Resume Stage 2.
  7. Agent interruption or handoff:
    • Resume from .context/gh-autopilot/context.md.
    • Continue using context.md as the source of next actions and state snapshot.

Safety Rules

  • Never process a cycle while state is already awaiting_address.
  • Never finalize a cycle without confirming comments were fully addressed.
  • Never finalize a cycle if any review thread lacks a resolve/rationale response.
  • Never finalize when parsed_summary.generated_comments > 0 but captured comment count is 0.
  • Never manually stop Stage 2 idle waiting before run-stage2-loop exits by configured limits or terminal status.
  • Never treat a single cycle timeout as terminal; it is always a retry path while Stage 2 max-wait budget remains.
  • Never claim terminal completion unless assert-drained exits 0.
  • Keep one push per cycle.
  • Do not delete .context/gh-autopilot/ artifacts mid-loop.
  • Keep context.md in sync by using engine commands (init, run-cycle, finalize-cycle) rather than manual edits.
  • Use gh-cli skill as the source of truth whenever selecting or changing gh commands in this skill.

Optional Utility Scripts

The following scripts remain available for ad-hoc diagnostics:

  • scripts/monitor_copilot_review.py
  • scripts/export_copilot_feedback.py

Prefer run_autopilot_loop.py for normal loop operation.

Weekly Installs
8
Repository
henryqw/skills
First Seen
10 days ago
Installed on
codex8
opencode6
gemini-cli6
antigravity6
claude-code6
github-copilot6