PR Babysitter

Two modes: one-shot comment triage or monitor mode with periodic polling via CronCreate.

One-shot: triage and fix PR review comments in a single pass
Monitor: set up a cron schedule to periodically check for merge conflicts, CI/CD failures, new review comments, and merge readiness

Scope

One-shot: unresolved review threads on open PRs, both human and bot comments
Monitor: ongoing PR health — conflicts, CI checks, comments, and readiness
Skip: closed or merged PRs, draft PRs unless explicitly requested

Reference Files

File	Read When
`references/github-api.md`	Default: GraphQL queries for fetching, replying, and resolving threads
`references/bot-patterns.md`	Comment triage: bot detection, severity parsing, deduplication, false positive rules
`references/fix-plan-template.md`	Comment triage Phase 3: generating the fix plan document
`references/monitoring-setup.md`	Monitor mode Phase 1: CronCreate configuration, state file format, schedule selection
`references/ci-platforms.md`	Monitor mode Phase 3: `gh` for GitHub, `bk`/`vercel`/`flyctl` for platform-specific logs and retries
`references/merge-conflicts.md`	Monitor mode Phase 2: detecting and resolving merge conflicts

One-Shot Mode: Comment Triage

Triage and resolve PR review comments from humans and bots in a structured four-phase workflow.

Copy this checklist to track progress:

Comment triage progress:
- [ ] Phase 1: Fetch unresolved review threads, reviews, and comments
- [ ] Phase 2: Triage and classify all items
- [ ] Phase 3: Write fix plan and get approval
- [ ] Phase 4: Execute fixes and resolve threads

Phase 1: Fetch

Load references/github-api.md for query templates.

Identify the PR — auto-detect from current branch (gh pr view --json number) or accept an explicit PR number argument
Extract owner/repo — gh repo view --json owner,name
Fetch all review threads — use the GraphQL reviewThreads query with pagination. Collect every thread, then filter to isResolved == false
Fetch PR reviews — use the REST reviews endpoint. Collect all reviews with their state, body, and author. Pay attention to CHANGES_REQUESTED reviews with body text — these contain the reviewer's summary of what needs to change
Fetch issue-level comments — use the REST endpoint for all PR conversation comments. Do not filter by author — bot comments from github-actions[bot] may contain actionable findings (DangerJS warnings, schema compatibility checks)
Early exit — if zero unresolved threads, zero actionable reviews, and zero actionable issue comments, report "No unresolved review items found" and stop

Output: three lists — unresolved threads, reviews with content, and issue-level comments.

Phase 2: Triage

Load references/bot-patterns.md for detection and parsing rules.

For each item across all three sources:

Identify author type — human or bot. For bots, classify by content first, then username. See references/bot-patterns.md for detection rules, severity parsing, and false positive patterns
Skip noise — auto-classify noise items per bot-patterns reference (linear linkbacks, vercel deployments, Devin "no issues" reviews, changeset releases, event-lib triggers)
Parse severity — extract from bot-specific format (emoji, SVG, HTML tables). Human comments: default to Major for CHANGES_REQUESTED, Minor for APPROVED + question
Deduplicate — group inline comments on the same file within a 3-line range. Keep the highest-severity, most-actionable comment. Mark others as ignore-duplicate
Classify each remaining item:
- Category: bug, security, performance, style, correctness, docs, test-coverage
- Severity: critical, major, minor, nitpick
- Confidence: high (human or multi-bot), medium (single bot, clear issue), low (single bot, likely false positive)
- Disposition: fix or ignore (with reason)
- Source type: thread (resolvable), conversation (reply-only)
Flag ignore candidates with specific reasoning:
- Pre-existing code not changed in this PR
- Bot flagged .md, config, lock, or migration files (no security implications)
- Contradicts project CLAUDE.md/AGENTS.md conventions
- Outdated thread where code has changed substantially
- Noise bot boilerplate or badge injection

Human comments are never auto-ignored. Always classify as fix unless clearly already resolved or the reviewer explicitly marked it as optional ("up to you", "but not blocking").

Phase 3: Plan

Load references/fix-plan-template.md for the output template.

Create plan file — write to .claude/scratchpad/pr-{N}-review-plan.md (create directory if needed)
Issues to Fix — ordered by severity (critical first), each with thread ID, file:line, author, category, finding, fix approach, and commit group label
Conversation Items — issue-level comments and review bodies that need fixes but have no thread ID. These get a reply but no resolve mutation
Ignored — each with thread ID or comment ID, author, reason, and the brief reply to post when resolving
Summary table — disposition-by-severity matrix for quick scanning
Present to user — print a summary showing counts (N to fix, K conversation items, M to ignore) and ask for approval

Stop here. Do not proceed to Phase 4 until the user reviews and approves the plan.

The user may edit the plan — move items between fix/conversation/ignore, change priorities, add notes. Re-read the plan file before executing.

Phase 4: Execute

After user approval, re-read .claude/scratchpad/pr-{N}-review-plan.md in case it was edited.

4a. Resolve ignored threads:

For each ignored item that has a thread ID:

Post a brief reply explaining the decision
Resolve the thread via GraphQL mutation

Use concise, specific reasons:

"Duplicate of thread addressing the same finding — resolving."
"Contradicts project convention (see CLAUDE.md) — resolving."
"Outdated thread — code has been refactored. Resolving."
"CI notification — not actionable. Resolving."

For ignored noise without a thread ID (issue-level bot comments), do nothing — no reply needed.

4b. Fix real issues:

Group fixes by commit group label from the plan. For each group:

If fixes touch independent files, launch parallel subagents (one per file group)
If fixes touch the same file or overlapping lines, execute sequentially
Each subagent reads the file, applies the fix, and verifies correctness

4c. Commit and push:

Run project lint/test commands if available
One commit per logical fix group, message format: fix({scope}): {description}
Push the branch

4d. Resolve and reply:

Resolved threads: post reply ("Fixed in latest push.") then resolve via GraphQL
Conversation items: post reply acknowledging the fix
Issue-level comments have no resolve mechanism — the reply is the acknowledgment

4e. Verify:

Re-fetch threads from GitHub (fresh API call) to confirm zero unresolved remain
Check CI status with gh pr checks
Report results to the user

Monitor Mode: Babysit PR

Set up periodic monitoring via CronCreate. Polls for merge conflicts, CI/CD failures, new review comments, and deployment status. Fixes what it can, runs comment triage when needed, and notifies on state changes.

Copy this checklist to track progress:

PR babysit progress:
- [ ] Phase 1: Initialize — identify PR, snapshot state, set up cron
- [ ] Phase 2: Conflict check — detect and resolve merge conflicts
- [ ] Phase 3: CI/CD check — poll checks, diagnose failures, fix and push
- [ ] Phase 4: Comment check — detect new comments, run comment triage
- [ ] Phase 5: Readiness check — evaluate merge readiness, notify user

Phase 1: Initialize

Load references/monitoring-setup.md for schedule configuration.

Identify the PR — auto-detect from current branch (gh pr view --json number,url,title,headRefName,baseRefName,mergeable,mergeStateStatus,reviewDecision) or accept an explicit PR number
Extract owner/repo — gh repo view --json owner,name
Snapshot current state — write to .claude/scratchpad/babysit-pr-{N}.md:
- Current HEAD SHA, mergeable status, check statuses, unresolved thread count, review decision
Detect CI platforms — scan check names from gh pr checks to identify active platforms (GitHub Actions, Buildkite, Vercel, Fly.io)
Choose schedule — default: every 5 minutes (*/5 * * * *). User may override
Ask user preferences — auto-resolve noise bot comments? auto-merge when ready?
Create cron job — CronCreate with a prompt that runs phases 2-5
Confirm with user — report state snapshot and schedule

Phase 2: Conflict Check

Load references/merge-conflicts.md for resolution strategy.

Check mergeable status — gh pr view --json mergeable,mergeStateStatus
- MERGEABLE → skip to Phase 3
- CONFLICTING → proceed to resolve
- UNKNOWN → wait, recheck next cycle
Attempt rebase — git fetch origin {base_branch} && git rebase origin/{base_branch}
- Clean rebase → git push --force-with-lease → notify user
- Conflicts in safe files (lockfiles, generated) → auto-resolve, push
- Complex conflicts → git rebase --abort → notify user with details

Never force-push without --force-with-lease. If the lease fails, someone else pushed — abort and notify.

Phase 3: CI/CD Check

Load references/ci-platforms.md for gh CLI commands per platform.

Poll check status — gh pr checks --json name,state,conclusion,detailsUrl
Classify each check — passing, pending (wait), or failing
If all passing → proceed to Phase 4
If any failing → diagnose:
- Identify platform from check name (see reference for patterns)
- Fetch logs via gh run view --log-failed (GitHub Actions), bk (Buildkite), vercel logs (Vercel), flyctl logs (Fly.io)
- Classify failure: flaky test (re-run), code error (fix + push), infrastructure (notify user), dependency issue (update lockfile)
- Fix and push if possible
Compare with previous state — flag regressions (previously passing, now failing)

Phase 4: Comment Check

Count unresolved threads — quick GraphQL count or gh pr view --json
Compare with state file — if new unresolved threads since last poll:
- Notify user: "N new review comments on PR #{N}"
- Run the one-shot comment triage workflow (phases 1-4 above)
Auto-resolve noise (if opted in during setup) — resolve only unambiguous noise bots (vercel, linear, changeset) with brief reason. Never auto-resolve human comments or critical/major findings

Phase 5: Readiness Check

Evaluate merge readiness — all of:
- mergeable == MERGEABLE (no conflicts)
- All required checks passing
- reviewDecision == APPROVED
- No unresolved blocking threads
If ready → notify user: "PR #{N} is ready to merge. All checks green, reviews approved, no conflicts."
- If auto-merge opted in: gh pr merge --auto --squash
If not ready → report blockers: "Waiting on: 2 checks pending" / "Blocked by: merge conflict"
Notify only on state changes — compare with previous poll:
- Check went green → "Build is green"
- Check broke → "Build broke: {check_name}"
- New review → "New review from @{reviewer}: {state}"
- Conflict detected → "Merge conflict with {base_branch}"
- All clear → "PR #{N} is green and ready"
Update state file for next poll cycle

Stopping the Monitor

"Stop babysitting" / "cancel the PR monitor" → CronDelete to remove the job
Session exit → jobs auto-clean (session-scoped)
PR merged or closed → auto-detect on next poll, self-cancel

On stop, report a final summary: total polls, fixes applied, conflicts resolved, current state.

Anti-patterns

Force-pushing without --force-with-lease — risks overwriting teammate commits
Auto-resolving human comments — never auto-resolve human feedback
Resolving threads without posting a reply — reviewers need to see the reasoning
Auto-ignoring human comments — always classify for user review
Fixing items the plan marked as ignore without approval
One commit per individual comment — group related fixes by commit group label
Pushing before verifying lint/test pass locally
Skipping Phase 3 (Plan) approval — the fix plan is the quality gate
Re-diagnosing failures while checks are still running — wait for completion
Polling more frequently than every 2 minutes — wastes API quota and context
Notifying on every poll with no state change — only notify on transitions
Auto-merging without explicit user opt-in — merge is a one-way door
Classifying github-actions[bot] as always noise — it is a shared identity used by DangerJS, schema checkers, and other active tools. Classify by content

Related skills

review-pr for local self-review before pushing fixes

babysit-pr