gh-autopilot

Installation

SKILL.md

GH Autopilot

Use this skill to operate a deterministic Copilot review loop on one PR. The user must explicitly choose the starting stage. The skill must begin there and keep looping until a terminal condition is reached.

This skill is stateful and persists artifacts under .context/gh-autopilot/.

Required Start Stage

Require the user to provide one of the following stage values:

1 (create_pr): PR not created yet. Create/select PR first.
2 (monitor_review): PR exists and Copilot is reviewing (or expected soon).
3 (address_comments): Copilot comments already exist and must be addressed now.

If start stage is missing, ask once and wait. Do not guess.

Terminal end conditions for the loop:

completed_no_comments (success)
timeout with reason stage2_max_wait_reached (Stage 2 overall wait budget exhausted)

Timing Contract

Initial wait: 300 seconds.
Poll interval after initial wait: 45 seconds.
Keep polling after 10 minutes.
Stop each cycle wait at 40 minutes (2400 seconds) and mark cycle timeout.
Stage 2 overall max wait defaults to 12 hours (43200 seconds) unless overridden.
On cycle timeout, immediately retry Stage 2 wait. Do not stop manually while still inside the Stage 2 max-wait budget.
Stop entire loop when Copilot summary says generated no comments.

Autopilot Persistence Contract

Autopilot is a persistent control loop. Once started, it must keep operating with timed polling and deterministic transitions until one of the following happens:

terminal success: completed_no_comments and drain guard passes
terminal timeout: timeout with reason stage2_max_wait_reached
explicit blocker: auth failure, state corruption, or PR mismatch that cannot be auto-recovered

Idle waiting is not a stop condition. A single cycle timeout is not a stop condition.

GH Command Reference Contract

When this skill uses gh commands, treat gh-cli as the command source of truth for command shape and flags.

Validate auth flows with gh-cli guidance (gh auth status, gh auth login).
Validate PR resolution/edit patterns with gh-cli guidance (gh pr view, gh pr edit --add-reviewer/--remove-reviewer, --json, --jq).
Validate GraphQL/API invocation patterns with gh-cli guidance (gh api graphql).

Primary Engine

Use scripts/run_autopilot_loop.py as the control-plane entrypoint.

Commands

init: initialize state for one PR.
run-cycle: wait for new Copilot review and export cycle artifacts.
run-stage2-loop: run Stage 2 with automatic cycle-timeout retries until action/terminal/Stage 2 max-wait limit.
finalize-cycle: mark current cycle addressed and re-request Copilot.
status: print current state.
assert-drained: fail if any address-required cycle is still pending.
simulate-fsm: deterministic dry-run of event-driven status transitions.

The engine is event-driven: state transitions are applied from explicit events (for example cycle_timeout, cycle_needs_address, finalize_with_reviewer_request) instead of ad-hoc status rewrites. Every JSON command result includes exactly one canonical resume_command to continue from the current state.

State File

Default: .context/gh-autopilot/state.json

Important fields:

status
cycle
last_processed_review_id
pending_review_id
pr

Event Log

Default: .context/gh-autopilot/events.jsonl

Each line is a normalized JSON event:

schema_version: event schema version.
timestamp: event time in UTC ISO8601.
event_type: normalized snake_case event name.
payload: event payload object.

Context Workspace Files

Use .context/gh-autopilot/ as the durable workspace for autonomy and recovery.

context.md: single source of truth for next actions, status snapshot, artifacts, and suggested commands. Includes a compact status header: phase=<...> | cycle=<...> | status=<...> | timeout_reason=<...>.

Keep this intentionally simple: one context file, not multiple overlapping notes.

Stage Router

Start from the user-selected stage: create_pr, monitor_review, or address_comments.

Routing rules:

Stage 1 (create_pr) always transitions to Stage 2.
Stage 2 (monitor_review) runs the persistent supervisor path (run-stage2-loop).
Stage 2 guard order is strict:
- check pending address/triage first
- check terminal no-comments + drain guard
- check per-cycle-timeout retry state
- check Stage 2 max-wait timeout
- otherwise run another poll cycle
If Stage 2 returns awaiting_address or awaiting_triage, transition to Stage 3.
Stage 3 (address_comments) finalizes the full batch with finalize-cycle, then transitions back to Stage 2.

Event-driven state transitions:

initialized|rerequested --begin_cycle_wait--> waiting_for_review
waiting_for_review --cycle_timeout--> timeout (cycle_max_wait_reached)
timeout (cycle_max_wait_reached) --stage2_retry_after_cycle_timeout--> initialized
waiting_for_review --cycle_no_comments--> completed_no_comments
waiting_for_review --cycle_needs_address--> awaiting_address
waiting_for_review --cycle_needs_triage--> awaiting_triage
awaiting_address --finalize_with_reviewer_request--> rerequested
awaiting_triage --finalize_with_reviewer_request--> rerequested
awaiting_address|awaiting_triage --finalize_without_reviewer_request--> initialized
initialized|waiting_for_review|rerequested --stage2_max_wait_reached--> timeout

Stage Details

Stage 1 (`create_pr`)

User intent: PR has not been created yet.

Actions:

Use gh-cli as reference for all gh command usage in this stage.
Run gh auth status.
Resolve current-branch PR with gh pr view (omit --pr).
If an open PR already exists for the branch, skip PR creation and move to Stage 2.
If no PR exists, run gh-pr-creation to open one.
Initialize state with init (avoid --force unless state is intentionally reset).
Move to Stage 2.

Stage 2 (`monitor_review`)

User intent: PR exists and we are waiting for Copilot output.

Actions:

Use gh-cli as reference for all gh command usage in this stage.
Run gh auth status.
Resolve PR (current branch or explicit --pr).
Ensure state exists for the PR:
- If missing: run init.
- If state already awaiting_address or awaiting_triage: move directly to Stage 3.
Run run-stage2-loop with normal timing (300/45/2400) plus Stage 2 max wait (43200 by default).
- On Stage 2 entry, run-stage2-loop performs an immediate fetch pass (initial_sleep=0) to capture already-finished Copilot reviews/comments.
- run-stage2-loop retries run-cycle automatically after each cycle timeout.
- run-cycle exports comments by matching each thread comment to the active Copilot review_id (not by timestamp cutoff).
Interpret result:
- completed_no_comments -> terminal success; stop loop.
  - includes cycles where no Copilot thread comments were captured for that review round
- timeout with reason stage2_max_wait_reached -> terminal timeout; stop loop.
- awaiting_address or awaiting_triage -> move to Stage 3.
Before any terminal stop/report in Stage 2, run assert-drained.
- If it exits non-zero, do not stop; continue to Stage 3.

If Copilot is already reviewing when Stage 2 starts, do not re-request reviewer; continue waiting with run-stage2-loop.

Never stop Stage 2 manually while the command is still within the configured Stage 2 max-wait limit.

Stage 3 (`address_comments`)

User intent: comments already exist and must be processed now.

Actions:

Ensure fresh cycle artifacts are available:
- If state is already awaiting_address or awaiting_triage, use existing cycle.json.
- Otherwise run run-cycle with --initial-sleep-seconds 0 to capture existing comments immediately.
- If parsed_summary.generated_comments > 0 but counts.copilot_comments_total == 0, artifacts are inconsistent: re-run immediate run-cycle and do not finalize until comments are captured.
Build normalized worker artifacts in shared context:
- run build_review_batch.py to create review-batch.json
Use gh-cli as reference for any gh commands used to resolve/reply on PR threads.
Run Stage 3 worker actions inside this skill:
- process all threads from review-batch.json
- account for every Copilot comment in those threads
- resolve each actionable thread in GitHub
- reply on each non-actionable thread with rationale
- do not leave any thread/comment unreviewed or unaddressed
- push exactly once for the batch
- do not request Copilot review while processing individual threads
- update cycle.json.addressing with complete per-thread and per-comment coverage
Validate cycle.json.addressing before finalizing:
- status=ready_for_finalize
- pushed_once=true
- review_id and cycle match active state
- threads.addressed + threads.rejected_with_rationale equals total thread count
- threads.needs_clarification=0
- thread_responses has exactly one entry per thread
- for each thread_responses entry:
  - classification=actionable requires resolved=true
  - classification=non-actionable requires rationale_replied=true
- comments.addressed_or_rationalized equals total comment count
- comments.needs_clarification=0
- comment_statuses has exactly one entry per comment with:
  - status in {action, no_action}
  - cycle equal to the active cycle
  - chronological sort by created_at
Run finalize-cycle only when validation passes (re-requests Copilot unless explicitly skipped for recovery).
- run this once per cycle, after the full thread batch is complete
- never run it immediately after addressing a single thread
- never call reviewer add/remove directly during Stage 3; finalize-cycle is the only allowed reviewer request path
Return to Stage 2.

Command Templates

Initialize State

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  init \
  --initial-sleep-seconds 300 \
  --poll-interval-seconds 45 \
  --cycle-max-wait-seconds 2400

Use --force with init only when intentionally resetting prior state. If reusing an existing PR branch, do not run --force unless the current state is stale or corrupted.

Monitor Stage 2 Loop (recommended)

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  run-stage2-loop \
  --initial-sleep-seconds 300 \
  --poll-interval-seconds 45 \
  --cycle-max-wait-seconds 2400 \
  --stage2-max-wait-seconds 43200

Use this command for normal Stage 2 operation. It performs an immediate bootstrap fetch first, then automatically retries cycle waits when a cycle-level timeout occurs.

Monitor One Cycle (diagnostic/manual)

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  run-cycle \
  --initial-sleep-seconds 300 \
  --poll-interval-seconds 45 \
  --cycle-max-wait-seconds 2400

Capture Existing Comments Immediately (Stage 3 bootstrap)

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  run-cycle \
  --initial-sleep-seconds 0 \
  --poll-interval-seconds 45 \
  --cycle-max-wait-seconds 2400

Build Stage 3 Worker Batch Artifacts

python "<path-to-skill>/scripts/build_review_batch.py" \
  --cycle ".context/gh-autopilot/cycle.json" \
  --output-dir ".context/gh-autopilot"

Finalize Addressed Cycle

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  finalize-cycle

This command validates cycle.json.addressing coverage first, then:

Moves pending_review_id into last_processed_review_id.
Increments cycle.
Re-requests Copilot via remove/add reviewer sequence.
Records per-comment status (action/no_action) in finalize event payloads.

Use --skip-reviewer-request only for manual recovery paths.

Print Current State

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  status

Assert No Pending Address-Required Cycle

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  --repo "." \
  --pr "<PR_NUMBER_OR_URL>" \
  assert-drained

Use this as the final gate before reporting completion/timeout handling results. If state is awaiting_address or awaiting_triage, this command fails and the loop must continue through Stage 3.

Simulate FSM Transitions (deterministic)

python "<path-to-skill>/scripts/run_autopilot_loop.py" \
  simulate-fsm \
  --start-status initialized \
  --event begin_cycle_wait \
  --event cycle_needs_address \
  --event finalize_with_reviewer_request

Artifacts and Exit Codes

Outputs (default .context/gh-autopilot/):

cycle.json
review-batch.json (generated by Stage 3 worker setup)
context.md (updated)

Status meanings:

completed_no_comments: terminal success
timeout: timeout status
- from run-cycle: timeout for that single cycle wait
- from run-stage2-loop: Stage 2 overall timeout (reason=stage2_max_wait_reached)
awaiting_address: actionable Copilot comments captured
awaiting_triage: review exists but needs manual interpretation

Exit codes:

0: terminal success or already-terminal state
3: run-cycle timeout
10: comments/triage action required
11: assert-drained detected unaddressed pending cycle
12: run-stage2-loop exhausted Stage 2 max-wait budget

Loop Contract

After entering via the user-selected stage, keep routing until terminal. Do not stop after a single cycle unless blocked by auth/state errors.

current_stage = user_selected_stage
while true:
  if current_stage == 1:
    run Stage 1
    current_stage = 2
    continue

  if current_stage == 2:
    run persistent Stage 2 supervisor
    if status in {awaiting_address, awaiting_triage}:
      current_stage = 3
      continue
    if status == completed_no_comments:
      if assert-drained != 0:
        current_stage = 3
        continue
      stop
    if status == timeout and reason == stage2_max_wait_reached:
      if assert-drained != 0:
        current_stage = 3
        continue
      stop
    # cycle timeout is internal retry; never stop here
    continue

  if current_stage == 3:
    run Stage 3 worker handoff
    if cycle.addressing.status != ready_for_finalize:
      stop and request clarification
    if cycle.addressing does not cover all review comments:
      stop and request clarification
    current_stage = 2
    continue

Recovery Scenarios

Handle common failure modes explicitly:

Auth failure:
- Run gh auth status.
- If unauthenticated, run gh auth login and retry.
State/PR mismatch:
- If state PR differs from intended PR, re-run init with correct --pr.
- Use --force only when intentionally discarding prior loop state.
Closed/merged PR mid-loop:
- Stop loop.
- Open or select a new active PR.
- Re-initialize state for that PR.
Existing open PR before start:
- Skip gh-pr-creation.
- Initialize directly against that PR.
Copilot already reviewing when loop starts (Stage 2):
- Skip re-request.
- Run run-stage2-loop and allow repeated cycle wait windows to continue.
- Continue normal addressing flow when cycle comments arrive.
Copilot comments already present when loop starts (Stage 3):
- Run immediate capture (--initial-sleep-seconds 0) only if cycle artifacts are missing/stale.
- Build review-batch.json in .context/gh-autopilot/.
- Address comments directly in Stage 3 of this skill.
- Finalize only when cycle.json.addressing reports ready and full comment coverage.
- Resume Stage 2.
Agent interruption or handoff:
- Resume from .context/gh-autopilot/context.md.
- Continue using context.md as the source of next actions and state snapshot.

Safety Rules

Never process a cycle while state is already awaiting_address.
Never finalize a cycle without confirming comments were fully addressed.
Never finalize a cycle if any review thread lacks a resolve/rationale response.
Never finalize when parsed_summary.generated_comments > 0 but captured comment count is 0.
Never manually stop Stage 2 idle waiting before run-stage2-loop exits by configured limits or terminal status.
Never treat a single cycle timeout as terminal; it is always a retry path while Stage 2 max-wait budget remains.
Never claim terminal completion unless assert-drained exits 0.
Keep one push per cycle.
Do not delete .context/gh-autopilot/ artifacts mid-loop.
Keep context.md in sync by using engine commands (init, run-cycle, finalize-cycle) rather than manual edits.
Use gh-cli skill as the source of truth whenever selecting or changing gh commands in this skill.

Optional Utility Scripts

The following scripts remain available for ad-hoc diagnostics:

scripts/monitor_copilot_review.py
scripts/export_copilot_feedback.py

Prefer run_autopilot_loop.py for normal loop operation.

Related skills

More from henryqw/skills

Installs

Repository

henryqw/skills

First Seen

Mar 4, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn

gh-autopilot

GH Autopilot

Required Start Stage

Timing Contract

Autopilot Persistence Contract

GH Command Reference Contract

Primary Engine

Commands

State File

Event Log

Context Workspace Files

Stage Router

Stage Details

Stage 1 (`create_pr`)

Stage 2 (`monitor_review`)

Stage 3 (`address_comments`)

Command Templates

Initialize State

Monitor Stage 2 Loop (recommended)

Monitor One Cycle (diagnostic/manual)

Capture Existing Comments Immediately (Stage 3 bootstrap)

Build Stage 3 Worker Batch Artifacts

Finalize Addressed Cycle

Print Current State

Assert No Pending Address-Required Cycle

Simulate FSM Transitions (deterministic)

Artifacts and Exit Codes

Loop Contract

Recovery Scenarios

Safety Rules

Optional Utility Scripts

More from henryqw/skills

gh-pr-creation

gh-address-copilot-review

triangulate

codex-subagent

trueflow

gh-pilot

gh-autopilot

GH Autopilot

Required Start Stage

Timing Contract

Autopilot Persistence Contract

GH Command Reference Contract

Primary Engine

Commands

State File

Event Log

Context Workspace Files

Stage Router

Stage Details

Stage 1 (create_pr)

Stage 2 (monitor_review)

Stage 3 (address_comments)

Command Templates

Initialize State

Monitor Stage 2 Loop (recommended)

Monitor One Cycle (diagnostic/manual)

Capture Existing Comments Immediately (Stage 3 bootstrap)

Build Stage 3 Worker Batch Artifacts

Finalize Addressed Cycle

Print Current State

Assert No Pending Address-Required Cycle

Simulate FSM Transitions (deterministic)

Artifacts and Exit Codes

Loop Contract

Recovery Scenarios

Safety Rules

Optional Utility Scripts

More from henryqw/skills

gh-pr-creation

gh-address-copilot-review

triangulate

codex-subagent

trueflow

gh-pilot

Stage 1 (`create_pr`)

Stage 2 (`monitor_review`)

Stage 3 (`address_comments`)