gh-autopilot
GH Autopilot
Use this skill to operate a deterministic Copilot review loop on one PR. The user must explicitly choose the starting stage. The skill must begin there and keep looping until a terminal condition is reached.
This skill is stateful and persists artifacts under .context/gh-autopilot/.
Required Start Stage
Require the user to provide one of the following stage values:
1(create_pr): PR not created yet. Create/select PR first.2(monitor_review): PR exists and Copilot is reviewing (or expected soon).3(address_comments): Copilot comments already exist and must be addressed now.
If start stage is missing, ask once and wait. Do not guess.
Terminal end conditions for the loop:
completed_no_comments(success)timeoutwith reasonstage2_max_wait_reached(Stage 2 overall wait budget exhausted)
Timing Contract
- Initial wait: 300 seconds.
- Poll interval after initial wait: 45 seconds.
- Keep polling after 10 minutes.
- Stop each cycle wait at 40 minutes (2400 seconds) and mark cycle timeout.
- Stage 2 overall max wait defaults to 12 hours (
43200seconds) unless overridden. - On cycle timeout, immediately retry Stage 2 wait. Do not stop manually while still inside the Stage 2 max-wait budget.
- Stop entire loop when Copilot summary says
generated no comments.
Autopilot Persistence Contract
Autopilot is a persistent control loop. Once started, it must keep operating with timed polling and deterministic transitions until one of the following happens:
- terminal success:
completed_no_commentsand drain guard passes - terminal timeout:
timeoutwith reasonstage2_max_wait_reached - explicit blocker: auth failure, state corruption, or PR mismatch that cannot be auto-recovered
Idle waiting is not a stop condition. A single cycle timeout is not a stop condition.
GH Command Reference Contract
When this skill uses gh commands, treat gh-cli as the command source of truth for command shape and flags.
- Validate auth flows with
gh-cliguidance (gh auth status,gh auth login). - Validate PR resolution/edit patterns with
gh-cliguidance (gh pr view,gh pr edit --add-reviewer/--remove-reviewer,--json,--jq). - Validate GraphQL/API invocation patterns with
gh-cliguidance (gh api graphql).
Primary Engine
Use scripts/run_autopilot_loop.py as the control-plane entrypoint.
Commands
init: initialize state for one PR.run-cycle: wait for new Copilot review and export cycle artifacts.run-stage2-loop: run Stage 2 with automatic cycle-timeout retries until action/terminal/Stage 2 max-wait limit.finalize-cycle: mark current cycle addressed and re-request Copilot.status: print current state.assert-drained: fail if any address-required cycle is still pending.simulate-fsm: deterministic dry-run of event-driven status transitions.
The engine is event-driven: state transitions are applied from explicit events (for example cycle_timeout, cycle_needs_address, finalize_with_reviewer_request) instead of ad-hoc status rewrites.
Every JSON command result includes exactly one canonical resume_command to continue from the current state.
State File
Default: .context/gh-autopilot/state.json
Important fields:
statuscyclelast_processed_review_idpending_review_idpr
Event Log
Default: .context/gh-autopilot/events.jsonl
Each line is a normalized JSON event:
schema_version: event schema version.timestamp: event time in UTC ISO8601.event_type: normalized snake_case event name.payload: event payload object.
Context Workspace Files
Use .context/gh-autopilot/ as the durable workspace for autonomy and recovery.
context.md: single source of truth for next actions, status snapshot, artifacts, and suggested commands. Includes a compact status header:phase=<...> | cycle=<...> | status=<...> | timeout_reason=<...>.
Keep this intentionally simple: one context file, not multiple overlapping notes.
Stage Router
Start from the user-selected stage: create_pr, monitor_review, or address_comments.
Routing rules:
- Stage 1 (
create_pr) always transitions to Stage 2. - Stage 2 (
monitor_review) runs the persistent supervisor path (run-stage2-loop). - Stage 2 guard order is strict:
- check pending address/triage first
- check terminal no-comments + drain guard
- check per-cycle-timeout retry state
- check Stage 2 max-wait timeout
- otherwise run another poll cycle
- If Stage 2 returns
awaiting_addressorawaiting_triage, transition to Stage 3. - Stage 3 (
address_comments) finalizes the full batch withfinalize-cycle, then transitions back to Stage 2.
Event-driven state transitions:
initialized|rerequested --begin_cycle_wait--> waiting_for_review
waiting_for_review --cycle_timeout--> timeout (cycle_max_wait_reached)
timeout (cycle_max_wait_reached) --stage2_retry_after_cycle_timeout--> initialized
waiting_for_review --cycle_no_comments--> completed_no_comments
waiting_for_review --cycle_needs_address--> awaiting_address
waiting_for_review --cycle_needs_triage--> awaiting_triage
awaiting_address --finalize_with_reviewer_request--> rerequested
awaiting_triage --finalize_with_reviewer_request--> rerequested
awaiting_address|awaiting_triage --finalize_without_reviewer_request--> initialized
initialized|waiting_for_review|rerequested --stage2_max_wait_reached--> timeout
Stage Details
Stage 1 (create_pr)
User intent: PR has not been created yet.
Actions:
- Use
gh-clias reference for allghcommand usage in this stage. - Run
gh auth status. - Resolve current-branch PR with
gh pr view(omit--pr). - If an open PR already exists for the branch, skip PR creation and move to Stage 2.
- If no PR exists, run
gh-pr-creationto open one. - Initialize state with
init(avoid--forceunless state is intentionally reset). - Move to Stage 2.
Stage 2 (monitor_review)
User intent: PR exists and we are waiting for Copilot output.
Actions:
- Use
gh-clias reference for allghcommand usage in this stage. - Run
gh auth status. - Resolve PR (current branch or explicit
--pr). - Ensure state exists for the PR:
- If missing: run
init. - If state already
awaiting_addressorawaiting_triage: move directly to Stage 3.
- If missing: run
- Run
run-stage2-loopwith normal timing (300/45/2400) plus Stage 2 max wait (43200by default).- On Stage 2 entry,
run-stage2-loopperforms an immediate fetch pass (initial_sleep=0) to capture already-finished Copilot reviews/comments. run-stage2-loopretriesrun-cycleautomatically after each cycle timeout.run-cycleexports comments by matching each thread comment to the active Copilotreview_id(not by timestamp cutoff).
- On Stage 2 entry,
- Interpret result:
completed_no_comments-> terminal success; stop loop.- includes cycles where no Copilot thread comments were captured for that review round
timeoutwith reasonstage2_max_wait_reached-> terminal timeout; stop loop.awaiting_addressorawaiting_triage-> move to Stage 3.
- Before any terminal stop/report in Stage 2, run
assert-drained.- If it exits non-zero, do not stop; continue to Stage 3.
If Copilot is already reviewing when Stage 2 starts, do not re-request reviewer;
continue waiting with run-stage2-loop.
Never stop Stage 2 manually while the command is still within the configured Stage 2 max-wait limit.
Stage 3 (address_comments)
User intent: comments already exist and must be processed now.
Actions:
- Ensure fresh cycle artifacts are available:
- If state is already
awaiting_addressorawaiting_triage, use existingcycle.json. - Otherwise run
run-cyclewith--initial-sleep-seconds 0to capture existing comments immediately. - If
parsed_summary.generated_comments > 0butcounts.copilot_comments_total == 0, artifacts are inconsistent: re-run immediaterun-cycleand do not finalize until comments are captured.
- If state is already
- Build normalized worker artifacts in shared context:
- run
build_review_batch.pyto createreview-batch.json
- run
- Use
gh-clias reference for anyghcommands used to resolve/reply on PR threads. - Run Stage 3 worker actions inside this skill:
- process all threads from
review-batch.json - account for every Copilot comment in those threads
- resolve each actionable thread in GitHub
- reply on each non-actionable thread with rationale
- do not leave any thread/comment unreviewed or unaddressed
- push exactly once for the batch
- do not request Copilot review while processing individual threads
- update
cycle.json.addressingwith complete per-thread and per-comment coverage
- process all threads from
- Validate
cycle.json.addressingbefore finalizing:status=ready_for_finalizepushed_once=truereview_idandcyclematch active statethreads.addressed + threads.rejected_with_rationaleequals total thread countthreads.needs_clarification=0thread_responseshas exactly one entry per thread- for each
thread_responsesentry:classification=actionablerequiresresolved=trueclassification=non-actionablerequiresrationale_replied=true
comments.addressed_or_rationalizedequals total comment countcomments.needs_clarification=0comment_statuseshas exactly one entry per comment with:statusin{action, no_action}cycleequal to the active cycle- chronological sort by
created_at
- Run
finalize-cycleonly when validation passes (re-requests Copilot unless explicitly skipped for recovery).- run this once per cycle, after the full thread batch is complete
- never run it immediately after addressing a single thread
- never call reviewer add/remove directly during Stage 3;
finalize-cycleis the only allowed reviewer request path
- Return to Stage 2.
Command Templates
Initialize State
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
init \
--initial-sleep-seconds 300 \
--poll-interval-seconds 45 \
--cycle-max-wait-seconds 2400
Use --force with init only when intentionally resetting prior state.
If reusing an existing PR branch, do not run --force unless the current
state is stale or corrupted.
Monitor Stage 2 Loop (recommended)
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
run-stage2-loop \
--initial-sleep-seconds 300 \
--poll-interval-seconds 45 \
--cycle-max-wait-seconds 2400 \
--stage2-max-wait-seconds 43200
Use this command for normal Stage 2 operation. It performs an immediate bootstrap fetch first, then automatically retries cycle waits when a cycle-level timeout occurs.
Monitor One Cycle (diagnostic/manual)
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
run-cycle \
--initial-sleep-seconds 300 \
--poll-interval-seconds 45 \
--cycle-max-wait-seconds 2400
Capture Existing Comments Immediately (Stage 3 bootstrap)
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
run-cycle \
--initial-sleep-seconds 0 \
--poll-interval-seconds 45 \
--cycle-max-wait-seconds 2400
Build Stage 3 Worker Batch Artifacts
python "<path-to-skill>/scripts/build_review_batch.py" \
--cycle ".context/gh-autopilot/cycle.json" \
--output-dir ".context/gh-autopilot"
Finalize Addressed Cycle
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
finalize-cycle
This command validates cycle.json.addressing coverage first, then:
- Moves
pending_review_idintolast_processed_review_id. - Increments
cycle. - Re-requests Copilot via remove/add reviewer sequence.
- Records per-comment status (
action/no_action) in finalize event payloads.
Use --skip-reviewer-request only for manual recovery paths.
Print Current State
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
status
Assert No Pending Address-Required Cycle
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
--repo "." \
--pr "<PR_NUMBER_OR_URL>" \
assert-drained
Use this as the final gate before reporting completion/timeout handling results.
If state is awaiting_address or awaiting_triage, this command fails and
the loop must continue through Stage 3.
Simulate FSM Transitions (deterministic)
python "<path-to-skill>/scripts/run_autopilot_loop.py" \
simulate-fsm \
--start-status initialized \
--event begin_cycle_wait \
--event cycle_needs_address \
--event finalize_with_reviewer_request
Artifacts and Exit Codes
Outputs (default .context/gh-autopilot/):
cycle.jsonreview-batch.json(generated by Stage 3 worker setup)context.md(updated)
Status meanings:
completed_no_comments: terminal successtimeout: timeout status- from
run-cycle: timeout for that single cycle wait - from
run-stage2-loop: Stage 2 overall timeout (reason=stage2_max_wait_reached)
- from
awaiting_address: actionable Copilot comments capturedawaiting_triage: review exists but needs manual interpretation
Exit codes:
0: terminal success or already-terminal state3:run-cycletimeout10: comments/triage action required11:assert-draineddetected unaddressed pending cycle12:run-stage2-loopexhausted Stage 2 max-wait budget
Loop Contract
After entering via the user-selected stage, keep routing until terminal. Do not stop after a single cycle unless blocked by auth/state errors.
current_stage = user_selected_stage
while true:
if current_stage == 1:
run Stage 1
current_stage = 2
continue
if current_stage == 2:
run persistent Stage 2 supervisor
if status in {awaiting_address, awaiting_triage}:
current_stage = 3
continue
if status == completed_no_comments:
if assert-drained != 0:
current_stage = 3
continue
stop
if status == timeout and reason == stage2_max_wait_reached:
if assert-drained != 0:
current_stage = 3
continue
stop
# cycle timeout is internal retry; never stop here
continue
if current_stage == 3:
run Stage 3 worker handoff
if cycle.addressing.status != ready_for_finalize:
stop and request clarification
if cycle.addressing does not cover all review comments:
stop and request clarification
current_stage = 2
continue
Recovery Scenarios
Handle common failure modes explicitly:
- Auth failure:
- Run
gh auth status. - If unauthenticated, run
gh auth loginand retry.
- Run
- State/PR mismatch:
- If state PR differs from intended PR, re-run
initwith correct--pr. - Use
--forceonly when intentionally discarding prior loop state.
- If state PR differs from intended PR, re-run
- Closed/merged PR mid-loop:
- Stop loop.
- Open or select a new active PR.
- Re-initialize state for that PR.
- Existing open PR before start:
- Skip
gh-pr-creation. - Initialize directly against that PR.
- Skip
- Copilot already reviewing when loop starts (Stage 2):
- Skip re-request.
- Run
run-stage2-loopand allow repeated cycle wait windows to continue. - Continue normal addressing flow when cycle comments arrive.
- Copilot comments already present when loop starts (Stage 3):
- Run immediate capture (
--initial-sleep-seconds 0) only if cycle artifacts are missing/stale. - Build
review-batch.jsonin.context/gh-autopilot/. - Address comments directly in Stage 3 of this skill.
- Finalize only when
cycle.json.addressingreports ready and full comment coverage. - Resume Stage 2.
- Run immediate capture (
- Agent interruption or handoff:
- Resume from
.context/gh-autopilot/context.md. - Continue using
context.mdas the source of next actions and state snapshot.
- Resume from
Safety Rules
- Never process a cycle while state is already
awaiting_address. - Never finalize a cycle without confirming comments were fully addressed.
- Never finalize a cycle if any review thread lacks a resolve/rationale response.
- Never finalize when
parsed_summary.generated_comments > 0but captured comment count is0. - Never manually stop Stage 2 idle waiting before
run-stage2-loopexits by configured limits or terminal status. - Never treat a single cycle timeout as terminal; it is always a retry path while Stage 2 max-wait budget remains.
- Never claim terminal completion unless
assert-drainedexits0. - Keep one push per cycle.
- Do not delete
.context/gh-autopilot/artifacts mid-loop. - Keep
context.mdin sync by using engine commands (init,run-cycle,finalize-cycle) rather than manual edits. - Use
gh-cliskill as the source of truth whenever selecting or changingghcommands in this skill.
Optional Utility Scripts
The following scripts remain available for ad-hoc diagnostics:
scripts/monitor_copilot_review.pyscripts/export_copilot_feedback.py
Prefer run_autopilot_loop.py for normal loop operation.
More from henryqw/skills
gh-pr-creation
Create a new GitHub pull request end-to-end when the user asks to open or create a PR. Use when Codex must turn local uncommitted work into a reviewable PR by making multiple scoped commits, running and passing all repository quality gates, renaming the branch so it reflects the changes, creating a Conventional Commits PR title, writing a PR description with summary/rationale/migration steps, and assigning Copilot as reviewer.
6gh-address-copilot-review
Handle GitHub PR review comments when comments are provided by the user as context. Use when Codex must evaluate comments one by one, classify each as actionable or non-actionable or needs clarification, implement only necessary fixes, keep changes scoped per comment, run validation, avoid intermediate pushes, perform one final push for the full batch, resolve addressed threads, respond to rejected comments with rationale, and re-request Copilot reviewer exactly once at the end via gh-assign-copilot-reviewer.
5triangulate
Evaluate supplied artifacts and return a consolidated findings table with evidence-based conclusions. Use this skill when the user wants a proposal, plan, code change, document, prompt, transcript, or other material reviewed through a structured multi-perspective evaluation instead of a single opinion.
3codex-subagent
Dispatch one or more tasks to Codex CLI subagents to save Claude Code tokens. Accepts explicit task descriptions, auto-selects sandbox (read-only vs workspace-write) and reasoning effort (high vs xhigh) based on task type, and collects structured results with durable artifacts.
2trueflow
Run the full generic trueflow pipeline by invoking `trueflow_initializer`, `trueflow_adversary`, and `trueflow_referee` in sequence, persisting stage outputs under `.context/trueflow/`, and returning a consolidated `findings.md` table. Use this skill whenever the user asks to "use trueflow" or wants multiple agents to review artifacts, solution proposals, coding implementation plans, documents, prompts, or other material and return adjudicated findings rather than a single opinion.
1gh-pilot
Iteratively drive a PR through GitHub Copilot review using a simple loop with direct `gh` commands and no helper scripts. Reuse existing Copilot feedback first, fetch unresolved thread state via GraphQL, request/re-request Copilot when needed, and require a fresh Copilot pass after pushed fixes.
1