codex-issue-waves
Codex Issue Waves
Run GitHub issues through codex exec in isolated worktrees, then shepherd the resulting PRs through review and merge. Four phases: dispatch → monitor → review wave → correction wave.
Orchestration model. Claude is the PO / engineering manager. Codex is the worker — implementer per worktree (parallel across worktrees), reviewer per worktree (parallel across worktrees, sequential after the implementer finishes its worktree), and corrector when fixes are needed. Claude never edits source files, never reads diffs for judgment, never runs in-place fixes. Every code touch — build, review, correction — goes through a codex dispatch. See invoking-codex-exec "Codex roles" and "Orchestrator boundaries" for the full rule.
When to use
Triggered by requests like:
- "Spawn codex on issues #A, #B, #C"
- "Have codex handle these issues in parallel"
- "Manage the PRs" / "process feedback" after such a batch
- "Get them merged"
Not for: single-issue work (implement directly or use /pr), or batches where codex CLI isn't available on the host.
Pre-dispatch: conflict triage
Before creating any worktree, surface conflicts between issues in the batch. Don't just fan out blindly.
Run through this checklist for every pair of issues:
- Same file, same region? Two issues mutating the same function / component / migration will guarantee a merge conflict. Combine into one branch or sequence them.
- Semantic overlap? One issue removes the surface another targets. Example: one PR replaces
edit_suggestions[]withclaim_verdicts[]; another adds a feature on top ofedit_suggestions[]. Flag before dispatch. - Shared numbering? Any two ADRs, migrations, or ordered resource types that would collide on merge (e.g., both land
005-*.md). Decide the ordering upfront; the later one bumps on merge. - Umbrella / tracking issues? Don't dispatch codex on a tracking issue that itemizes sub-issues. Pick the first unblocked sub-issue instead, or skip entirely.
- Trivial siblings? Two near-trivial fixes in the same file are cheaper as one combined branch than two parallel codex runs that will fight at merge.
Always pause here and surface findings to the user with a concrete proposal ("combine these two, skip the umbrella, run the rest in parallel"). Do not proceed silently. Only after user confirms, continue.
Dispatch phase
REQUIRED SUB-SKILL: Use invoking-codex-exec for the codex launch mechanics — flags, sandbox traps, monitoring, kill-and-recover. This skill does not duplicate that content; it covers only the multi-issue orchestration on top.
For each (possibly combined) issue to dispatch:
- Fetch origin/main and create a worktree off it. Naming:
.worktrees/issue-<number>-<slug>and branchfeat/<slug>/refactor/<slug>/fix/<slug>depending on the issue type. - Fetch the issue body from GitHub.
- Write a focused prompt per worktree. See
references/prompt-template.mdfor the shape — the prompt is the single most important artifact in this workflow. - Launch codex in background per
invoking-codex-exec, redirecting output to/tmp/codex-runs/<n>.out.
See references/common-commands.md for the exact shell recipes (worktree creation, issue fetch, codex launch).
Schedule a wakeup to check back (~10–15 min for moderate refactors, ~5 min for trivial changes, ~20 min for big architectural work). Do not sleep-poll in bash. If multiple runs are outstanding, schedule a single aggregated wakeup, not one per run.
Monitor phase
When a codex run finishes, immediately check:
- Does the output end with a
pull/<N>URL? If not, codex aborted partway — tail the output to find out why. - CI status on the PR via
gh pr view <N> --json statusCheckRollup,mergeable,mergeStateStatus. - Any obvious red flags in the tail of the output (conflict markers, test failures, "I couldn't do X" messages). Codex's live logging sometimes shows
<<<<<<< / =======markers while it resolves conflicts. These are transient. Always verify againstgh pr diff <N>before reporting "conflict markers in committed code."
If multiple codex runs are still outstanding, reschedule a single aggregated wakeup; do not schedule one per run.
Review wave
Treat every codex-produced PR as untrusted. Trust-but-verify — through codex, not by hand.
For each PR:
-
PR existence check (status, not review) — claude runs:
gh pr view <N> --json statusCheckRollup,mergeable,mergeStateStatus,headRefOid— does the PR exist, is CI green, is it mergeable?git log --oneline origin/main..origin/<branch>— does the branch have a sane shape (no ghost/lost commits)?git diff origin/main...origin/<branch> --stat— what's the blast radius (file count, line count) so claude knows whether to expect a "5-line tweak" review verdict or a "120-file refactor" review?
These are status operations: they answer existence/state questions, not "is the code good." Reading the diff itself for judgment goes to the reviewer codex.
-
Reviewer dispatch (parallel across worktrees) — for each PR/worktree, dispatch
invoking-codex-execin reviewer role. Fire all reviewer codexes together (one process per worktree, parallel). Each reviewer codex receives:- The dispatch prompt that was used to produce its PR (path or inlined) — so it sees the same
## Scopeblock as the implementer. - The issue body (inlined, not linked).
- The diff to review:
git diff origin/main...origin/<branch>, fenced as## Artifact under reviewwith the prompt-injection-defuse line (seeinvoking-codex-exec"Prompt shape — reviewer role"). - References to project rules:
CLAUDE.md,AGENTS.md. - This skill's reviewer checklist, included verbatim:
- Scope adherence: diff stays within the dispatch prompt's
## Scopeblock — files or behaviors outsideIn scope(or explicitly listedOut of scope) are scope-creep, flag as Blocking. Decisions that contradict resolvedOpen questionsare Blocking. - Rebase integrity: nothing lost from concurrent merges to main.
- Correctness of renames / retargets: any straggler reference to old names, routes, file paths.
- Race conditions in new DB writes: the select-then-insert antipattern is common — prefer
INSERT ... ON CONFLICT DO UPDATEfor upserts. - Redundant migration blocks: codex tends to duplicate bootstrap + migration for the same table.
- UI parity across sibling pages: if the feature lives in both a list and a detail view, check both.
- Test gaps: UNIQUE constraints, auth paths, toggle-off-then-on flows.
- Project-rule compliance: no AI-tool references (Claude / Codex / Copilot) in code, comments, or commit messages; conventional commit prefix; no
--no-verify; noTODO/FIXME/console.logslop.
- Scope adherence: diff stays within the dispatch prompt's
- The output contract: write JSON to
<worktree>/.codex-review-output.json. The orchestrator enforces read-only via the snapshot recipe ininvoking-codex-exec.
- The dispatch prompt that was used to produce its PR (path or inlined) — so it sees the same
-
Concurrency notes for parallel review:
- Each reviewer codex runs in its own worktree. No file-system contention — review-output paths are per-worktree, source paths are per-branch.
- Each reviewer is dispatched as its own background
codex execprocess with its own log file (/tmp/codex-review-<PR-N>.log). Wait by PID, same as implementer dispatches. Schedule a single aggregated wakeup if multiple reviewers are still in flight, not one wakeup per reviewer. - The implementer codex must finish (PID exit) before its reviewer codex starts in the same worktree. Sequential per worktree, parallel across worktrees.
-
Read all review outputs and decide per PR — once every reviewer has exited:
- Validate each
.codex-review-output.jsonis parseable. Malformed → handle perinvoking-codex-exec"When the JSON is malformed". - Verify read-only enforcement passed for each. Boundary violation → handle per
invoking-codex-exec"Read-only enforcement". - For each PR, decide:
verdict: approvedand emptyblocking/scope_violations→ eligible for merge (still subject to the merge-order decision below).- Any
blockingorscope_violations→ schedule a corrector dispatch for that PR. - Only
should_fixornits→ claude decides per item: address now via corrector, or defer to PR comment with reasoning.nitsdefault to deferral.
- Cross-check
scope_violationsagainst the issue body (NOT the diff). Reviewer codex sometimes flags a feature as scope-creep when it was actually requested in the issue. Read the issue body — that is a status read, not a code review.
- Validate each
Correction wave
Every blocker → corrector dispatch via invoking-codex-exec (corrector role). No in-place fixes by claude. No "small fix" path. Even one-line typos go through codex.
For each PR with blockers:
- Compose the corrector prompt per the correction-prompt template in
references/prompt-template.md. Include:- Path to the original dispatch prompt.
- Path to
.codex-review-output.json. - Explicit list of items the corrector must address (subset of
blocking∪ chosenshould_fix∪scope_violations). - A
## Scopeblock whoseIn scopeis exactly those items and whoseOut of scopeis "everything else, including allnitsand any item the orchestrator deferred".
- Launch the corrector codex in background (same worktree). Wait by PID. If multiple correctors run in parallel (different worktrees), one aggregated wakeup.
- After corrector exits:
- Status check:
git log --oneline origin/main..origin/<branch>to confirm a fixup commit landed. - Push:
git push --force-with-leaseif the corrector rebased; otherwise plain push. Claude runs the push command directly — pushing is status, not code work. - Post a PR comment summarizing what changed, so the next reviewer doesn't have to re-read the whole diff. Comment text is composed by claude from the review JSON + the corrector prompt — it's a status report, not code judgment.
- Wait for CI via
scripts/wait_for_ci.sh <PR>— run through the harness's background-command mechanism so the exit event wakes the agent; do not inline-poll in bash.
- Status check:
- Re-dispatch reviewer (back to step 2 of the review wave for this PR).
- After 3 corrector cycles for the same PR without
verdict: approved, escalate to user. The plan or the issue itself is likely wrong — don't loop indefinitely.
Merge + cleanup
When CI is green, the reviewer is satisfied, and the blocking items are done:
- Decide merge order if multiple PRs are ready. Order matters when:
- They both touch conflicting files (merge simpler first so the other rebases cleanly).
- One supersedes another semantically (merge the reframe first; rework its dependent branch after).
- ADR numbers collide (whichever lands first claims the number; the other renumbers in the correction wave — do not ask codex to "merge in a specific order", do it manually).
- Run
scripts/merge_and_cleanup.sh <PR> <worktree-path> <branch>— squash-merges, then removes the worktree, then deletes the local branch, in the only order that works. Doing this by hand is error-prone becausegit branch -Dfails while a worktree pins the branch, so always go through the script (or follow the exact sequence inreferences/common-commands.md).
Deploy (only if asked)
Merging to main does not auto-deploy in most repos. If the user asks to "deploy" after a merge, check the repo's workflows to see what triggers deploy — commonly a workflow_dispatch on a Deploy workflow. Production is shared infrastructure; confirm environment (development / staging / production) before dispatching.
Key pitfalls
See references/pitfalls.md for the full list. The top items:
- Claude is the PO, codex is the worker — for build AND review — every code touch is a codex dispatch. Building is implementer codex, reviewing is reviewer codex (read-only, JSON output), correcting is corrector codex. Claude never edits source files, never reads diffs for judgment, never makes "quick in-place fixes."
- Reviewer codex needs read-only enforcement — codex has no built-in read-only mode. The orchestrator must snapshot HEAD + working tree before dispatch and verify they're unchanged after. See
invoking-codex-exec"Read-only enforcement". Two consecutive boundary violations → escalate. - Reviewer JSON can be malformed — codex occasionally produces invalid JSON or empty review files. One re-dispatch with a stronger contract; second failure → escalate.
- Self-review is blocked by GitHub —
gh pr review --approvefails with "cannot approve your own PR" for the PR's author account. Post viagh pr commentinstead. - Worktree pins branch — local branch delete fails until the worktree is removed. Always clean in that order.
- Codex log ≠ committed code — transient conflict markers in the codex log are not the same as committed markers. Verify with
gh pr diff <N>before raising the alarm. - Status checks vs. content checks —
gh pr view,git log --oneline,git diff --stat, CI status: status, claude does these. Reading the diff to judge if it's good: content, that's a reviewer codex dispatch.
Success criteria
The batch is done when:
- Every non-skipped issue has an open or merged PR.
- Every open PR has been through a reviewer-codex pass with
verdict: approvedand read-only enforcement passed. - Every PR has at least one human-visible summary comment composed by claude explaining what changed vs the review feedback (this is a status report, not a review).
- Every worktree created by this skill has been removed after its branch merged.
- No code touch was made by claude during the run. Audit by checking that every commit on every branch was authored within a codex run.
More from ddnetters/homelab-agent-skills
ntfy-notifications
Self-hosted push notifications with ntfy — publishing, authentication, priorities, and integration patterns for scripts and monitoring
9gogcli
Use `gog` CLI for Gmail and Google Calendar operations. Trigger when user asks to send/search/read email, manage Gmail labels/drafts/filters, create/update/list calendar events, check availability, or interact with Google Workspace email and scheduling. Also covers auth setup, account management, and output formatting.
2arr-media-stack
Radarr, Sonarr, Prowlarr, Bazarr, and qBittorrent APIs for automated media management — search, add, monitor, and troubleshoot downloads
2caddy-reverse-proxy
Caddy reverse proxy configuration, Caddyfile syntax, automatic HTTPS, Docker integration, and common patterns
2langfuse-observability
LLM observability with Langfuse — query traces, generations, costs, metrics, and debug LLM pipelines via the REST API
2plex-media-server
Plex Media Server API — library management, media search, playback sessions, server status, and automation
2