gh-autopilot
GH Autopilot
Use this skill to operate a deterministic Copilot review loop on one PR. The user must explicitly choose the starting stage. The skill must begin there and keep looping until a terminal condition is reached.
This skill is stateful and persists artifacts under .context/gh-autopilot/.
Required Start Stage
Require the user to provide one of the following stage values:
1(create_pr): PR not created yet. Create/select PR first.2(monitor_review): PR exists and Copilot is reviewing (or expected soon).3(address_comments): Copilot comments already exist and must be addressed now.
If start stage is missing, ask once and wait. Do not guess.
Terminal end conditions for the loop:
completed_no_comments(success)timeoutwith reasonstage2_max_wait_reached(Stage 2 overall wait budget exhausted)
Timing Contract
- Initial wait: 300 seconds.
- Poll interval after initial wait: 45 seconds.
- Keep polling after 10 minutes.
- Stop each cycle wait at 40 minutes (2400 seconds) and mark cycle timeout.
- Stage 2 overall max wait defaults to 12 hours (
43200seconds) unless overridden. - On cycle timeout, immediately retry Stage 2 wait. Do not stop manually while still inside the Stage 2 max-wait budget.
- Stop entire loop when Copilot summary says
generated no comments.
Autopilot Persistence Contract
Autopilot is a persistent control loop. Once started, it must keep operating with timed polling and deterministic transitions until one of the following happens:
- terminal success:
completed_no_commentsand drain guard passes - terminal timeout:
timeoutwith reasonstage2_max_wait_reached - explicit blocker: auth failure, state corruption, or PR mismatch that cannot be auto-recovered
Idle waiting is not a stop condition. A single cycle timeout is not a stop condition.
GH Command Reference Contract
When this skill uses gh commands, treat gh-cli as the command source of truth for command shape and flags.
- Validate auth flows with
gh-cliguidance (gh auth status,gh auth login). - Validate PR resolution/edit patterns with
gh-cliguidance (gh pr view,gh pr edit --add-reviewer/--remove-reviewer,--json,--jq). - Validate GraphQL/API invocation patterns with
gh-cliguidance (gh api graphql).
Primary Engine
Use scripts/run_autopilot_loop.py as the control-plane entrypoint.
Commands
init: initialize state for one PR.run-cycle: wait for new Copilot review and export cycle artifacts.run-stage2-loop: run Stage 2 with automatic cycle-timeout retries until action/terminal/Stage 2 max-wait limit.finalize-cycle: mark current cycle addressed and re-request Copilot.status: print current state.assert-drained: fail if any address-required cycle is still pending.simulate-fsm: deterministic dry-run of event-driven status transitions.
The engine is event-driven: state transitions are applied from explicit events (for example cycle_timeout, cycle_needs_address, finalize_with_reviewer_request) instead of ad-hoc status rewrites.
Every JSON command result includes exactly one canonical resume_command to continue from the current state.
State File
Default: .context/gh-autopilot/state.json
Important fields:
statuscyclelast_processed_review_idpending_review_idpr
Event Log
Default: .context/gh-autopilot/events.jsonl
Each line is a normalized JSON event:
schema_version: event schema version.timestamp: event time in UTC ISO8601.event_type: normalized snake_case event name.payload: event payload object.
Context Workspace Files
Use .context/gh-autopilot/ as the durable workspace for autonomy and recovery.
context.md: single source of truth for next actions, status snapshot, artifacts, and suggested commands. Includes a compact status header:phase=<...> | cycle=<...> | status=<...> | timeout_reason=<...>.
Keep this intentionally simple: one context file, not multiple overlapping notes.
Stage Router
Start from the user-selected stage: create_pr, monitor_review, or address_comments.
Routing rules:
- Stage 1 (
create_pr) always transitions to Stage 2. - Stage 2 (
monitor_review) runs the persistent supervisor path (run-stage2-loop). - Stage 2 guard order is strict:
- check pending address/triage first
- check terminal no-comments + drain guard
- check per-cycle-timeout retry state
- check Stage 2 max-wait timeout
- otherwise run another poll cycle
- If Stage 2 returns
awaiting_addressorawaiting_triage, transition to Stage 3. - Stage 3 (
address_comments) finalizes the full batch withfinalize-cycle, then transitions back to Stage 2.
Event-driven state transitions:
initialized|rerequested --begin_cycle_wait--> waiting_for_review
waiting_for_review --cycle_timeout--> timeout (cycle_max_wait_reached)
timeout (cycle_max_wait_reached) --stage2_retry_after_cycle_timeout--> initialized
waiting_for_review --cycle_no_comments--> completed_no_comments
waiting_for_review --cycle_needs_address--> awaiting_address
waiting_for_review --cycle_needs_triage--> awaiting_triage
awaiting_address --finalize_with_reviewer_request--> rerequested
awaiting_triage --finalize_with_reviewer_request--> rerequested
awaiting_address|awaiting_triage --finalize_without_reviewer_request--> initialized
initialized|waiting_for_review|rerequested --stage2_max_wait_reached--> timeout
Stage Details
Stage 1 (create_pr)
User intent: PR has not been created yet.
Actions:
- Use
gh-clias reference for allghcommand usage in this stage. - Run
gh auth status. - Resolve current-branch PR with
gh pr view(omit--pr). - If an open PR already exists for the branch, skip PR creation and move to Stage 2.
- If no PR exists, run
gh-pr-creationto open one. - Initialize state with
init(avoid--forceunless state is intentionally reset). - Move to Stage 2.
Stage 2 (monitor_review)
User intent: PR exists and we are waiting for Copilot output.
Actions:
- Use
gh-clias reference for allghcommand usage in this stage. - Run
gh auth status. - Resolve PR (current branch or explicit
--pr). - Ensure state exists for the PR:
- If missing: run
init. - If state already
awaiting_addressorawaiting_triage: move directly to Stage 3.
- If missing: run
- Run
run-stage2-loopwith normal timing (300/45/2400) plus Stage 2 max wait (43200by default).- On Stage 2 entry,
run-stage2-loopperforms an immediate fetch pass (initial_sleep=0) to capture already-finished Copilot reviews/comments. run-stage2-loopretriesrun-cycleautomatically after each cycle timeout.run-cycleexports comments by matching each thread comment to the active Copilotreview_id(not by timestamp cutoff).
- On Stage 2 entry,
- Interpret result:
completed_no_comments-> terminal success; stop loop.- includes cycles where no Copilot thread comments were captured for that review round
timeoutwith reasonstage2_max_wait_reached-> terminal timeout; stop loop.awaiting_addressorawaiting_triage-> move to Stage 3.
- Before any terminal stop/report in Stage 2, run
assert-drained.- If it exits non-zero, do not stop; continue to Stage 3.
If Copilot is already reviewing when Stage 2 starts, do not re-request reviewer;
continue waiting with run-stage2-loop.
Never stop Stage 2 manually while the command is still within the configured Stage 2 max-wait limit.
Stage 3 (address_comments)
User intent: comments already exist and must be processed now.
Actions:
- Ensure fresh cycle artifacts are available:
- If state is already
awaiting_addressorawaiting_triage, use existingcycle.json. - Otherwise run
run-cyclewith--initial-sleep-seconds 0to capture existing comments immediately. - If
parsed_summary.generated_comments > 0butcounts.copilot_comments_total == 0, artifacts are inconsistent: re-run immediaterun-cycleand do not finalize until comments are captured.
- If state is already
- Build normalized worker artifacts in shared context:
- run
build_review_batch.pyto createreview-batch.json
- run
- Use
gh-clias reference for anyghcommands used to resolve/reply on PR threads. - Run Stage 3 worker actions inside this skill:
- process all threads from
review-batch.json - account for every Copilot comment in those threads
- resolve each actionable thread in GitHub
- reply on each non-actionable thread with rationale
- do not leave any thread/comment unreviewed or unaddressed
- push exactly once for the batch
- do not request Copilot review while processing individual threads
- update
cycle.json.addressingwith complete per-thread and per-comment coverage
- process all threads from
- Validate
cycle.json.addressingbefore finalizing:status=ready_for_finalizepushed_once=truereview_idandcyclematch active statethreads.addressed + threads.rejected_with_rationaleequals total thread countthreads.needs_clarification=0thread_responseshas exactly one entry per thread- for each
thread_responsesentry:classification=actionablerequiresresolved=trueclassification=non-actionablerequiresrationale_replied=true
comments.addressed_or_rationalizedequals total comment countcomments.needs_clarification=0comment_statuseshas exactly one entry per comment with:statusin{action, no_action}cycleequal to the active cycle- chronological sort by
created_at
- Run
finalize-cycleonly when validation passes (re-requests Copilot unless explicitly skipped for recovery).- run this once per cycle, after the full thread batch is complete
- never run it immediately after addressing a single thread
- never call reviewer add/remove directly during Stage 3;
finalize-cycleis the only allowed reviewer request path
- Return to Stage 2.
Command Templates
Initialize State
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
init \
--initial-sleep-seconds 300 \
--poll-interval-seconds 45 \
--cycle-max-wait-seconds 2400
Use --force with init only when intentionally resetting prior state.
If reusing an existing PR branch, do not run --force unless the current
state is stale or corrupted.
Monitor Stage 2 Loop (recommended)
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
run-stage2-loop \
--initial-sleep-seconds 300 \
--poll-interval-seconds 45 \
--cycle-max-wait-seconds 2400 \
--stage2-max-wait-seconds 43200
Use this command for normal Stage 2 operation. It performs an immediate bootstrap fetch first, then automatically retries cycle waits when a cycle-level timeout occurs.
Monitor One Cycle (diagnostic/manual)
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
run-cycle \
--initial-sleep-seconds 300 \
--poll-interval-seconds 45 \
--cycle-max-wait-seconds 2400
Capture Existing Comments Immediately (Stage 3 bootstrap)
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
run-cycle \
--initial-sleep-seconds 0 \
--poll-interval-seconds 45 \
--cycle-max-wait-seconds 2400
Build Stage 3 Worker Batch Artifacts
python "<path-to-skill>/scripts/build_review_batch.py" \
--cycle ".context/gh-autopilot/cycle.json" \
--output-dir ".context/gh-autopilot"
Finalize Addressed Cycle
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
finalize-cycle
This command validates cycle.json.addressing coverage first, then:
- Moves
pending_review_idintolast_processed_review_id. - Increments
cycle. - Re-requests Copilot via remove/add reviewer sequence.
- Records per-comment status (
action/no_action) in finalize event payloads.
Use --skip-reviewer-request only for manual recovery paths.
Print Current State
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
status
Assert No Pending Address-Required Cycle
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
assert-drained
Use this as the final gate before reporting completion/timeout handling results.
If state is awaiting_address or awaiting_triage, this command fails and
the loop must continue through Stage 3.
Simulate FSM Transitions (deterministic)
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
simulate-fsm \
--start-status initialized \
--event begin_cycle_wait \
--event cycle_needs_address \
--event finalize_with_reviewer_request
Artifacts and Exit Codes
Outputs (default .context/gh-autopilot/):
cycle.jsonreview-batch.json(generated by Stage 3 worker setup)context.md(updated)
Status meanings:
completed_no_comments: terminal successtimeout: timeout status- from
run-cycle: timeout for that single cycle wait - from
run-stage2-loop: Stage 2 overall timeout (reason=stage2_max_wait_reached)
- from
awaiting_address: actionable Copilot comments capturedawaiting_triage: review exists but needs manual interpretation
Exit codes:
0: terminal success or already-terminal state3:run-cycletimeout10: comments/triage action required11:assert-draineddetected unaddressed pending cycle12:run-stage2-loopexhausted Stage 2 max-wait budget
Loop Contract
After entering via the user-selected stage, keep routing until terminal. Do not stop after a single cycle unless blocked by auth/state errors.
current_stage = user_selected_stage
while true:
if current_stage == 1:
run Stage 1
current_stage = 2
continue
if current_stage == 2:
run persistent Stage 2 supervisor
if status in {awaiting_address, awaiting_triage}:
current_stage = 3
continue
if status == completed_no_comments:
if assert-drained != 0:
current_stage = 3
continue
stop
if status == timeout and reason == stage2_max_wait_reached:
if assert-drained != 0:
current_stage = 3
continue
stop
# cycle timeout is internal retry; never stop here
continue
if current_stage == 3:
run Stage 3 worker handoff
if cycle.addressing.status != ready_for_finalize:
stop and request clarification
if cycle.addressing does not cover all review comments:
stop and request clarification
current_stage = 2
continue
Recovery Scenarios
Handle common failure modes explicitly:
- Auth failure:
- Run
gh auth status. - If unauthenticated, run
gh auth loginand retry.
- Run
- State/PR mismatch:
- If state PR differs from intended PR, re-run
initwith correct--pr. - Use
--forceonly when intentionally discarding prior loop state.
- If state PR differs from intended PR, re-run
- Closed/merged PR mid-loop:
- Stop loop.
- Open or select a new active PR.
- Re-initialize state for that PR.
- Existing open PR before start:
- Skip
gh-pr-creation. - Initialize directly against that PR.
- Skip
- Copilot already reviewing when loop starts (Stage 2):
- Skip re-request.
- Run
run-stage2-loopand allow repeated cycle wait windows to continue. - Continue normal addressing flow when cycle comments arrive.
- Copilot comments already present when loop starts (Stage 3):
- Run immediate capture (
--initial-sleep-seconds 0) only if cycle artifacts are missing/stale. - Build
review-batch.jsonin.context/gh-autopilot/. - Address comments directly in Stage 3 of this skill.
- Finalize only when
cycle.json.addressingreports ready and full comment coverage. - Resume Stage 2.
- Run immediate capture (
- Agent interruption or handoff:
- Resume from
.context/gh-autopilot/context.md. - Continue using
context.mdas the source of next actions and state snapshot.
- Resume from
Safety Rules
- Never process a cycle while state is already
awaiting_address. - Never finalize a cycle without confirming comments were fully addressed.
- Never finalize a cycle if any review thread lacks a resolve/rationale response.
- Never finalize when
parsed_summary.generated_comments > 0but captured comment count is0. - Never manually stop Stage 2 idle waiting before
run-stage2-loopexits by configured limits or terminal status. - Never treat a single cycle timeout as terminal; it is always a retry path while Stage 2 max-wait budget remains.
- Never claim terminal completion unless
assert-drainedexits0. - Keep one push per cycle.
- Do not delete
.context/gh-autopilot/artifacts mid-loop. - Keep
context.mdin sync by using engine commands (init,run-cycle,finalize-cycle) rather than manual edits. - Use
gh-cliskill as the source of truth whenever selecting or changingghcommands in this skill.
Optional Utility Scripts
The following scripts remain available for ad-hoc diagnostics:
scripts/monitor_copilot_review.pyscripts/export_copilot_feedback.py
Prefer run_autopilot_loop.py for normal loop operation.