code-review
Parse Arguments
Parse $ARGUMENTS to determine the review mode and flags.
Mode Detection
| Priority | Mode | Condition | Description |
|---|---|---|---|
| 1 | PR Review (explicit) | $ARGUMENTS contains a standalone numeric token (e.g., 42) or --pr <number> |
Review a specific PR's diff + commit context |
| 2 | Working Dir (forced) | --wd flag is set |
Force working directory review, even on a PR branch |
| 3 | Path Review | $ARGUMENTS contains a file/directory path |
Review code at the specified path |
| 4 | PR Review (auto) | $ARGUMENTS is empty → auto-detect (see below) |
Auto-detected PR on current branch |
| 5 | Working Dir | $ARGUMENTS is empty and no PR detected |
Review current working directory changes (staged + unstaged). If no changes, falls through to Path Review (cwd) |
PR Auto-Detection
When $ARGUMENTS is empty and --wd is not set, attempt to detect an open PR on the current branch:
gh pr list --head "$(git branch --show-current)" --author "@me" --state open --json number,title --jq '.[0]'
Key behaviors:
--author "@me"ensures only the current user's PRs are matched, skipping bot PRs (dependabot, renovate, etc.)- If a PR is found → enter PR Review (auto) mode with the detected PR number
- If no PR is found (empty result, detached HEAD, or
main/masterbranch) → fall through to Working Dir mode
Context-Aware Scope Adjustment
After resolving the initial mode, review the conversation history to assess the user's actual intent. This applies ONLY when the resolved mode is Working Dir (Priority 5) — i.e., no positional arguments and no PR detected. Explicit flags like --wd (Priority 2) and PR modes (Priority 1, 4) are not subject to this adjustment.
This adjustment operates before Step 1 (Context Builder) and complements the no-changes fallback in Working Dir Mode: Context-Aware decides based on conversation signals pre-diff; the no-changes fallback decides based on actual diff results in Step 1.
Evaluate whether the user wants a diff-focused review (changes only) or a broader directory review. Consider:
- Broaden to Path Review (cwd): User's conversation implies analysis, diagnosis, exploration, or state assessment of existing code — not just reviewing recent changes. Examples: "분석해줘", "현재 상태 봐줘", "진단해줘", "코드 살펴봐", "조사해줘", codebase onboarding context
- Keep Working Dir: User is actively developing and wants feedback on their changes. Examples: "수정한 거 봐줘", "변경사항 리뷰", iterating on feature branch with substantial diff
If broadening, display:
💡 Scope adjusted — 대화 맥락에 따라 현재 디렉토리 코드 리뷰로 전환합니다
When both changes and broader analysis are relevant, use Path Review mode with the diff included as supplementary context for the domain agents.
Flag Detection
| Flag | Default | Description |
|---|---|---|
-y|--yes|--force|-f |
off | Publish without approval |
-g|--graph |
off | Generate a visual change-flow graph for PR review (Mermaid on GitHub, summary in terminal preview). Ignored in Working Dir / Path modes. |
-d|--domain |
auto | Override domain selection (e.g., -d security,perf) |
-s|--sub |
off | Use sub-agents instead of team agents for domain analysis |
--pr |
— | Explicit PR mode (e.g., --pr 42). Use when path arguments contain digits |
--wd |
off | Force Working Dir mode, skipping PR auto-detection |
-q|--quick |
off | Quick mode: single-pass analysis without agent spawn. Auto-detected domains capped at 2, Codex disabled, Info findings omitted, graph skipped. Designed for fast iteration during development/testing. |
--no-codex |
off | Disable Codex integration entirely (skip Codex detection and execution) |
--codex |
off | Force enable Codex with default behavior (adversarial only). Use when auto-detection is unreliable |
--codex-general |
off | Use Codex general review (codex:review) only, without adversarial review |
--codex-both |
off | Run both Codex general review and adversarial review in parallel. Shorthand for --codex --codex-general |
--full-scan |
off | Include pre-existing issues (unrelated to PR changes) in General Findings. By default, out-of-diff findings without causal relationship to the PR are dismissed. PR mode only; ignored in Working Dir / Path mode. |
Codex flag precedence: --no-codex > --codex-both > --codex + --codex-general (combo = both) > --codex-general alone > --codex alone > default.
When --codex and --codex-general are both present, the combined effect is both mode (equivalent to --codex-both).
If $ARGUMENTS contains explicit publish intent ("comment 달아", "바로 올려", "게시해", "post it"), treat as -y.
Quick Mode Implicit Effects
When --quick is set, the following flags are implicitly forced regardless of other arguments:
| Implicit Override | Effect |
|---|---|
--no-codex |
Codex always disabled |
-g / --graph ignored |
Graph generation skipped |
Display hint immediately after flag parsing:
⚡ Quick mode — 단일 패스 분석, Critical/Warning 우선 (없으면 Info fallback)
Step 1: Context Builder
Gather all available context based on the review mode.
PR Review Mode
gh pr view {number} --json title,body,baseRefName,headRefName,commits,files,labels --jq '.'
gh pr diff {number}
git log --oneline {baseRefName}..{headRefName} 2>/dev/null | head -30
Read the PR description to understand the author's intent. This is critical for cross-validation.
Extract a 1-2 sentence PR Purpose summary from the title, description, and commit messages. If the PR description is empty or minimal, derive the purpose from the title and commit messages alone. This summary is injected into the domain agent prompts via ## PR Context in Step 3's Common Prompt Suffix.
Working Dir Mode
git diff HEAD --stat
git diff HEAD
git log --oneline -5
If there are no changes (no staged or unstaged diffs), transition to Path Review mode targeting the current working directory. Display hint:
💡 변경사항 없음 — 현재 디렉토리 코드 리뷰로 전환합니다
Then proceed with Path Review Mode logic below.
Note: In diff review mode, git diff HEAD only includes tracked files. Untracked (new) files are not included — stage them first (git add) to include. In the Path Review fallback (no changes detected → cwd review), files are read directly from the filesystem, so untracked files are typically included.
Path Review Mode
Use Glob to list files under the specified path. Read each file's content.
git log --oneline -10 -- {path}
Common (all modes)
For each changed file, read sufficient surrounding context (at least 30 lines around each change hunk) to understand the code's purpose. Use Read to examine files directly.
If -g (graph) flag is set and the mode is PR Review, build a change-flow graph for the GitHub review output. In Working Dir / Path mode, -g is ignored — the terminal cannot render Mermaid and there is no GitHub publishing target.
- Trace relationships: For each changed file, identify imports, exports, function calls, and data flow to/from other changed files using Grep.
- Map edges: Record directional relationships — code-level (
imports,calls,extends,emits/consumes) and conceptual (references,shared logic,configures). - Group by module: If changed files exceed 7, group by parent directory into logical modules (subgraph nodes). Individual files become child nodes.
- Construct Mermaid source: Build a
flowchart LRdiagram. Nodes = changed files (or module groups). Edges = relationships found in step 2, labeled with relationship type.
Skip condition: If no edges (relationships) are found between changed files after step 2, skip graph generation entirely. In the terminal preview, show: 📊 Change Graph: 파일 간 관계가 감지되지 않아 그래프를 생략합니다. In the GitHub format, omit the graph section silently.
The graph is rendered differently per output format (see Step 5):
- GitHub: Full Mermaid code block (native rendering).
- Terminal (PR preview only): One-line summary with module/file counts and primary flow path.
Step 2: Domain Router
Determine which domain agents to activate based on changed files.
Auto-Activation Rules
| File Pattern | Activated Domains |
|---|---|
.tsx, .jsx, .css, .html, .vue, .svelte |
Architecture, Domain Logic |
.sql, .prisma, *migration* |
Performance, Security |
auth/, middleware/, security/, *token* |
Security |
*.test.*, *.spec.* |
Domain Logic |
Dockerfile, k8s/, terraform/, *.yml (CI) |
Architecture, Security |
package.json, requirements.txt, go.mod |
Security |
General source code (.ts, .py, .go, .java, .rs, etc.) |
Architecture, Domain Logic |
Default (no pattern match): Architecture + Domain Logic.
Override
If --domain flag is provided, use ONLY the specified domains regardless of file patterns.
Collect the union of all activated domains from all changed files. Deduplicate.
Quick Mode Domain Cap
When --quick is set (and --domain is NOT provided), cap the auto-detected domains to at most 2 using this priority:
Security > Domain Logic > Architecture > Performance
Process: run auto-detection as normal, then sort by priority and take the top 2. If auto-detection returns ≤ 2, use all of them as-is.
When --domain is explicitly provided alongside --quick, respect the override — do not cap.
Step 2.5: Codex Detection
Determine whether the Codex plugin is available and resolve the Codex execution mode.
Quick mode: Skip this entire step. Codex mode is already forced to disabled (see Quick Mode Implicit Effects). Proceed directly to Step 3.
Companion Detection
REQUIRED: Run this command before proceeding to Mode Resolution.
COMPANION=$(find ~/.claude/plugins/cache/openai-codex -name "codex-companion.mjs" -print 2>/dev/null | sort | tail -1)
COMPANIONis non-empty → Codex availableCOMPANIONis empty → Codex unavailable
Do not proceed to Mode Resolution until this command has been executed and the result evaluated.
Mode Resolution
| Priority | Condition | Codex Mode | Hint |
|---|---|---|---|
| 1 | --no-codex flag is set |
disabled | None |
| 2 | Codex unavailable + any Codex flag set (--codex, --codex-general, --codex-both) |
disabled | ⚠️ Codex unavailable — {flag} 요청했으나 companion을 찾을 수 없습니다 |
| 3 | Codex unavailable + no Codex flag | disabled | None |
| 4 | --codex-both flag is set, OR --codex + --codex-general both present |
both | 💡 Codex detected — review + adversarial 병렬 실행 |
| 5 | --codex-general flag is set (without --codex) |
review | 💡 Codex detected — general review 실행 |
| 6 | --codex flag is set (without --codex-general) |
adversarial | 💡 Codex detected — adversarial review (강제 활성화) |
| 7 | Default (no Codex flag) | adversarial | 💡 Codex detected — adversarial review 실행 |
Companion subcommands per mode:
- adversarial:
adversarial-review --wait - review:
review --wait - both:
review --wait+adversarial-review --wait
In PR mode, append --base {baseRefName} to each subcommand to scope the review to the PR diff.
Display the resolved Hint (if any) immediately after detection, before launching domain agents. Hints follow the project's Hint 패턴 (> **{icon} {action}** — {reason}).
Store the resolved mode for use in Steps 3, 4, and 5. Each spawned Codex agent re-resolves the companion path independently via find since agents run in separate contexts. If mode is disabled, skip all Codex-related logic in subsequent steps and proceed exactly as before (full backward compatibility).
Step 3: Domain Agents (Parallel Execution)
Launch activated domain agents in parallel. Each agent receives the full diff and context from Step 1.
Quick Mode: Single-Pass Analysis
When --quick is set, skip all agent spawning (team and sub-agent). Instead, perform the analysis directly in the main context:
- For each activated domain (max 2 from Step 2), analyze the diff using that domain's Investigation Protocol (see Domain Definitions below).
- Produce findings in the same format as agent results: title, file path, primary line number, occurrence count, description, and action line.
- Produce findings for all severity levels (Critical, Warning, Info). The output step decides which severities to display based on results (see Step 5 Quick Mode Output).
- Skip Codex execution entirely (mode is disabled).
This eliminates the overhead of agent creation, context window allocation, and inter-agent communication. Proceed directly to Step 4 after analysis.
Execution Mode (Normal)
| Mode | Condition | Description |
|---|---|---|
| Team agents (default) | -s/--sub flag NOT set |
Each domain runs as an independent team agent with its own context window. Better result quality for large diffs. |
| Sub-agents | -s/--sub flag set |
Each domain runs as a sub-agent via the Agent tool. Results return to the main context. Lighter for small diffs. |
Team Agent Mode (default)
Used when -s/--sub flag is NOT set. Each domain agent gets its own context window.
TeamCreate— name:code-review.- For each activated domain:
TaskCreate— subject:"{Domain} domain analysis". Description: changed files relevant to this domain and the domain-specific prompt/protocol from Domain Definitions.- Spawn teammate via
Agentwithteam_name: "code-review",nameset to domain name (lowercase, e.g.,"security","architecture"),model: "opus". Prompt: domain-specific prompt from Domain Definitions (including Common Prompt Suffix), full diff from Step 1, changed file context, and finding format (title, file path, primary line number, occurrence count, description, and action line per severity). TaskUpdate— setownerto the agent name.
- Monitor
TaskListuntil all domain tasks complete. Collect findings from agent messages. - Shut down agents via
SendMessagewithshutdown_request. - Pass aggregated findings to Step 4.
Fallback: if TeamCreate is unavailable, switch to Sub-agent mode.
Sub-agent Mode
Used when -s/--sub flag IS set, or as fallback. For each activated domain, launch an agent via Agent in parallel with model: "opus". Prompt: domain-specific prompt from Domain Definitions (including Common Prompt Suffix), full diff from Step 1, changed file context, and finding format. Results return to the main context.
Domain Definitions
Each domain agent — whether team or sub — receives its domain-specific prompt via the Agent tool's prompt parameter. All agents are spawned with model: "opus" and without subagent_type (general-purpose). Domain specialization is handled entirely through the prompt — do not delegate to external agent types. The prompt for each domain consists of: the domain-specific prompt below, followed by the Common Prompt Suffix.
Common Prompt Suffix
Append the following sections to every domain agent prompt:
## PR Context (PR Review Mode only — omit this section in Working Dir / Path Review mode)
**PR Purpose**: {1-2 sentence summary of the PR's goal, extracted from the PR title, description, and commit messages gathered in Step 1}
When reviewing code, use this PR purpose as your primary lens:
- Check whether all changes are consistent with the stated purpose
- Look for incomplete implementations: if the PR aims to do X, are there
places where X is only partially done? (e.g., i18n PR that leaves hardcoded
strings in some files)
- Identify files in the diff that were changed but not fully aligned with the
PR's goal
- For data files (YAML, JSON, config), verify cross-language/cross-environment
consistency when the PR's purpose involves such concerns
## Constraints
- You are read-only. Do not attempt to modify any files.
- Write findings in the language configured in the project's CLAUDE.md. If no language is configured, follow the user's conversational language.
- If no issues are found, return an empty findings list (no items) and state "No issues found." Do not manufacture findings.
## Severity Criteria
- Critical: Security vulnerability, data loss risk, or crash-inducing bug
- Warning: Potential bug, performance issue, or maintainability concern
- Info: Suggestion, minor improvement, or style note
## Output Format
Return findings as a structured list. Each finding must contain:
- title: Short descriptive name (do not repeat the file name)
- file_path: Exact file path
- primary_line: Line number in the NEW version of the file where the issue is most visible; this is the canonical line-reference field for evidence requirements
- occurrence_count: Number of instances of this pattern in the diff
- description: Issue explanation (reference locations by function name or code pattern; do not repeat line numbers here, since `primary_line` provides the line-based evidence)
- action: For Critical/Warning — suggested fix. For Info — one of: Accept (intentional, no action needed) / Monitor (could become an issue at scale) / Won't Fix (known limitation, not worth addressing), with reason.
🛡️ Security
You are a security engineer conducting a focused security audit of code changes.
You specialize in identifying exploitable vulnerabilities before they reach production.
Prioritize findings by: severity × exploitability × blast radius.
## Investigation Protocol (follow this order)
1. Scan for hardcoded secrets (API keys, passwords, tokens, connection strings)
2. Check input validation: all user inputs sanitized? Parameterized queries?
3. Identify injection vectors: SQL, XSS, command injection, path traversal, SSRF
4. Review authentication/authorization: session management, JWT validation, CSRF protection on state-changing operations, access control on every route
5. Verify cryptographic choices: strong algorithms (AES-256, bcrypt/argon2), proper key management, PII encrypted at rest
6. Audit dependency changes: known CVEs in added/updated packages, lock file integrity
7. Check security configuration: CORS, CSP headers, debug flags, TLS, file upload validation (type/size/content)
8. Assess security logging: auth failures and access denials logged? No sensitive data in logs?
## Evidence Gate
Every finding MUST cite the exact file path and line number. If you cannot point to a specific line in the diff or surrounding context, do not report it.
⚡ Performance
You are a performance engineer analyzing code changes for runtime efficiency issues.
Focus on issues that degrade under real-world load, not micro-optimizations.
## Investigation Protocol (follow this order)
1. Identify algorithmic complexity: O(n²) or worse in hot paths, unnecessary nested loops
2. Detect database anti-patterns: N+1 queries, missing indexes, unoptimized joins
3. Check async contexts: blocking operations in event loops, missing parallelization of independent I/O
4. Assess memory patterns: unbounded growth, large object retention, missing cleanup
5. Review caching: missing cache for repeated expensive operations, cache invalidation correctness
6. Check resource management: connection pool sizing, file handle leaks, stream backpressure
## Evidence Gate
Every finding MUST cite the exact file path and line number. Only report issues with measurable impact under realistic load. Do not flag theoretical micro-optimizations.
🏗️ Architecture
You are a staff engineer reviewing code changes for long-term maintainability and design coherence.
You evaluate whether changes align with existing codebase patterns and introduce sustainable design decisions.
## Investigation Protocol (follow this order)
1. Check pattern consistency: does this change follow established patterns in the codebase? Use Grep to find similar implementations and compare approaches.
2. Evaluate SOLID principles: SRP (single reason to change?), DIP (depends on abstractions?), OCP (extensible without modification?)
3. Assess coupling: are new dependencies appropriate? Is the dependency direction correct?
4. Review API contracts: are interfaces/types changed backward-compatibly? Are contracts clear?
5. Check module boundaries: does the change respect existing boundaries? Is responsibility placed correctly?
6. Assess technical debt: does this change introduce shortcuts that will compound?
## Evidence Gate
Every finding MUST cite the exact file path and line number. Reference the existing pattern or file that the change should align with. When recommending a change, state the trade-off (what is gained vs. what is sacrificed). Do not report subjective style preferences.
🔍 Domain Logic
You are a senior engineer who owns this codebase's business domain, reviewing changes for correctness.
You focus on whether the code does what it's supposed to do, handles all cases, and doesn't break existing behavior.
## Investigation Protocol (follow this order)
0. Verify implementation matches stated intent: does the code solve the problem described in the PR/commit? Anything missing? Anything extra that wasn't requested?
1. Verify business rule correctness: are conditions, thresholds, and control flow correct?
2. Check error handling completeness: all error paths handled? Errors propagate correctly? Resource cleanup?
3. Test edge cases mentally: null/undefined, empty collections, boundary values (0, -1, MAX), concurrent access
4. Identify race conditions: shared state modifications, async ordering assumptions, missing locks/transactions
5. Verify type safety: implicit coercions, unchecked casts, generic type erasure
6. Check state management: initialization order, invalid state transitions, stale data
## Scope
Your scope is correctness and behavioral soundness. Do not flag style, pattern consistency, or maintainability concerns — the Architecture domain covers those.
## Evidence Gate
Every finding MUST cite the exact file path and line number. Describe the specific input or scenario that triggers the bug. Do not report hypothetical issues without a concrete trigger.
Each finding must include: title, file path, primary line number, occurrence count, description, and action line — suggested fix for Critical/Warning severity, recommendation label (Accept / Monitor / Won't Fix) for Info severity.
Primary line number: The line number in the new version of the file where the issue is most clearly visible. Required for inline comment targeting in PR mode. In the finding's description text, reference locations by section heading, function name, or code pattern — not by raw line number (line numbers are used only for comment placement, not in the output text).
Codex Parallel Execution
If Codex mode (from Step 2.5) is NOT disabled, launch Codex alongside the domain agents in parallel. Codex runs as an independent second reviewer — it is NOT a domain agent, but its execution is concurrent with domain agents.
Codex is invoked via the companion runtime directly through Bash, NOT via Skill("codex:..."). The codex:adversarial-review and codex:review skills have disable-model-invocation: true, which blocks Skill() invocation. The companion runtime (codex-companion.mjs) has no such restriction and is the official programmatic interface for invoking Codex from Claude Code.
Team Agent Mode (default, -s/--sub NOT set)
For each Codex subcommand to invoke (per the mode table in Step 2.5):
TaskCreate— subject:"Codex {mode} review"(e.g.,"Codex adversarial review").- Spawn teammate via
Agentwith:team_name: "code-review"name: "codex"(or"codex-general"/"codex-adversarial"when mode is both, to distinguish the two)- Prompt: Instruct the agent to run the companion via Bash and return the findings. The Bash command:
WhereCOMPANION=$(find ~/.claude/plugins/cache/openai-codex -name "codex-companion.mjs" -print 2>/dev/null | sort | tail -1) && node "$COMPANION" {subcommand} --wait{subcommand}isadversarial-revieworreviewper the mode table. Use--base {baseRefName}for PR mode.
TaskUpdate— setownerto the agent name.
The Codex agent(s) run in parallel with domain agents (Security, Architecture, etc.) within the same code-review team. They appear in TaskList output alongside domain agents, satisfying traceability (Acceptance Criterion F).
Sub-agent Mode (-s/--sub set)
For each Codex subcommand to invoke:
- Launch via
Agentwithname: "codex"(or"codex-general"/"codex-adversarial"for both mode). Noteam_name. The agent runs the companion via Bash (same command as Team Agent Mode) and returns findings.
Launch Codex agents at the same time as domain agents — do NOT wait for domain agents to finish first.
Codex Failure Handling
If a Codex agent reports a non-zero exit code or returns an error (e.g., quota exhausted, authentication failure, network error):
- Do NOT retry — treat the Codex contribution as unavailable for this run.
- Findings = empty — proceed with domain agent findings only. Do not attempt to parse error output as findings.
- Terminal notice — classify the error and display the appropriate message:
| Error Signal | Terminal Notice |
|---|---|
stderr contains auth, login, API key, unauthorized, 401 |
⚠️ Codex auth required — !codex setup 실행 권장 |
| Any other non-zero exit | ℹ️ Codex: unavailable (skipped) |
- GitHub format — do NOT include any Codex failure notice. Codex availability is an internal infrastructure detail, not relevant to PR reviewers.
Fallback (non-Claude Code runners)
If neither team agents nor sub-agents are available (e.g., Codex CLI, Gemini CLI as the runner platform), perform all domain analyses sequentially in a single pass. Analyze each domain's Investigation Protocol one by one and collect findings.
Step 4: Cross-Validation
This is the quality gate. Review ALL findings from domain agents against the full context to filter false positives.
Quick Mode: Lightweight Validation
When --quick is set, perform a reduced validation pass:
- Expanded context: Read at least 15 lines around the flagged location using Read (half of normal).
- Sanity check: Verify the finding references real code (not a false match from diff noise).
Skip git history, comments/docs search, and PR description cross-reference. Apply Confirmed / Dismissed verdicts only (no Demoted). This trades thoroughness for speed — obvious false positives are still caught, but edge cases may pass through.
Out-of-Diff Finding Filter (see below) applies identically in Quick mode.
Out-of-Diff Finding Filter (PR Mode Only)
For findings that reference files NOT in the PR diff, apply a causality test before proceeding to the normal verdict process:
Causality Test: Does this finding have a direct causal relationship with the PR's changes?
| Causality Type | Description | Verdict |
|---|---|---|
| Consistency gap | The PR introduces or modifies a pattern, but the same pattern remains not yet updated in other files. The PR is incomplete without this change. | → Proceed to normal verdict (Confirmed/Demoted/Dismissed) |
| Side effect | A function/API/contract changed in the PR is called or depended on by code outside the diff, and that code will break or behave incorrectly. | → Proceed to normal verdict |
| Pre-existing issue | A code defect that existed before this PR and is unrelated to the PR's changes. Discovered incidentally during review. | → If --full-scan: proceed to normal verdict. Otherwise: Dismissed (not in scope for this review). |
To determine causality:
- Identify what the PR changed (from Step 1 context).
- For the out-of-diff finding, ask: "Would this finding exist even if this PR had never been created?" If yes → pre-existing issue.
- If no → check whether it's a consistency gap or side effect, and proceed to normal verdict.
Findings that pass the causality test (or are included via --full-scan)
retain their original severity and are included via contextual mapping
(Step 6 tier 2) or General Findings (tier 3) as appropriate.
Normal Mode
For each finding:
- Expanded context: Read at least 30 lines around the flagged location using Read.
- Git history: Check if the code was intentionally written this way:
git log --oneline -5 -- {file} - Comments/docs: Search for TODO, FIXME, or design notes near the flagged code. Check if the project's CLAUDE.md or other documentation addresses the pattern.
- PR description/commit messages: Cross-reference against the author's stated intent (from Step 1).
Verdict per finding
| Verdict | Action |
|---|---|
| Confirmed | Real issue — include in final output |
| Demoted | Real but already acknowledged — downgrade to Info with context note |
| Dismissed | False positive — remove from output |
Log dismissed findings internally (do not output them) to avoid noise.
Codex Findings Integration
If Codex mode is NOT disabled, collect findings from the Codex agent(s) after they complete. Codex findings join the cross-validation process identically to domain agent findings:
- Merge into unified pool: Combine Codex findings with domain agent findings before cross-validation.
- Apply the same verdict process: Each Codex finding undergoes the same Confirmed / Demoted / Dismissed evaluation (expanded context, git history, comments/docs, PR description).
- Tag preservation: Preserve the Codex origin tag on each finding (see Step 5 for tag rules). The tag indicates which Codex subcommand produced the finding.
Codex findings are NOT given special treatment — they must pass the same quality gate as domain agent findings.
Deduplication
When multiple agents (domain or Codex) flag the same code location:
- Same root cause: Merge into a single finding. Keep the higher severity and credit all relevant sources (e.g.,
Architecture • Domain Logic,Architecture • Codex). Codex tags follow Step 5 Source Tags rules (CodexorCodex Adv). - Different concerns: Keep as separate findings, each under its own domain.
Step 5: Output Generator
Produce severity-first structured output.
Quick Mode Output
When --quick is set, the output severity is determined by results:
- Critical/Warning 1건 이상: Critical/Warning만 출력, Info 생략. Summary:
Findings: 🔴 {n} critical · 🟡 {n} warnings. - Critical/Warning 0건, Info 1건 이상: Info findings를 fallback으로 출력 (Recommendation labels 포함). Summary:
Findings: 🟢 {n} info. - 전체 0건: Terminal/GitHub 공통 zero-findings 규칙 적용 (
✅ No issues found.).
Common rules:
- Source Tags — same as Normal mode, using domain names.
- Graph — always omitted (see Quick Mode Implicit Effects).
- Codex tags — not applicable (Codex disabled).
- GitHub format: If publishing in PR mode, use the same GitHub format with the same severity filtering rules above.
Severity Levels
| Severity | Icon | Criteria |
|---|---|---|
| Critical | 🔴 | Security vulnerability, data loss risk, crash-inducing bug |
| Warning | 🟡 | Potential bug, performance issue, maintainability concern |
| Info | 🟢 | Suggestion, minor improvement, style note |
Recommendation Labels
Info findings use a recommendation label instead of a fix suggestion. The label conveys the reviewer's assessment of whether action is needed.
| Label | Meaning | When to use |
|---|---|---|
| Accept | Acknowledged, no action needed | Intentional design choice, acceptable trade-off, or cosmetic preference |
| Monitor | No immediate action, track over time | Could become an issue at scale, under load, or after future changes |
| Won't Fix | Known limitation, not worth addressing | Cost outweighs benefit, outside scope, or constrained by external factors |
Source Tags
Each finding displays a source tag after the title (e.g., **Finding title** — Security). Domain agent findings use their domain name. Codex findings use tags based on the active Codex mode:
| Codex Mode | Source(s) | Tag(s) |
|---|---|---|
| disabled | Domain agents only | Domain name (e.g., Security, Architecture) |
adversarial (default / --codex) |
Domain agents + Codex adversarial | Domain name / Codex |
review (--codex-general) |
Domain agents + Codex review | Domain name / Codex |
both (--codex-both / --codex --codex-general) |
Domain agents + Codex review + Codex adversarial | Domain name / Codex (review findings) / Codex Adv (adversarial findings) |
When Codex mode is adversarial or review (single source), all Codex findings are tagged — Codex.
When Codex mode is both (dual source), review findings are tagged — Codex and adversarial findings are tagged — Codex Adv to distinguish the two sources.
All findings (domain + Codex) are sorted together by severity (Critical → Warning → Info), not grouped by source.
Output Templates
Write the review in the language configured in the project's CLAUDE.md. If no language is configured, follow the user's conversational language. Examples below are in Korean.
There are two output formats depending on the rendering medium:
- Terminal format: Used for Working Dir / Path mode, and for PR mode preview (before approval).
- GitHub format: Used only for the final PR comment published to GitHub (after approval).
Terminal Format
Optimized for CLI readability. No HTML tags, no tables, flat structure.
Three-level visual hierarchy: ──────────────────── (Unicode thin × 20) between severity sections, --- between findings and after severity headers, blank lines within findings.
Finding title comes first (renders as bold/bright in terminal), file path second.
## Code Review: {target}
Domains: {activated domains joined by " • "}{if Codex enabled: " · Codex 🤖"}
Findings: 🔴 {critical_count} critical · 🟡 {warning_count} warnings · 🟢 {info_count} info
{if -g flag set AND PR mode AND relationships found:}
📊 Change Graph: GitHub 게시 시 Mermaid 플로우차트 포함
{module_count} modules · {file_count} files · 주요 흐름: {primary_flow_path}
{else if -g flag set AND PR mode AND no relationships found:}
📊 Change Graph: 파일 간 관계가 감지되지 않아 그래프를 생략합니다
{end if}
────────────────────
### < 🔴 Critical ({n}) >
---
**C1. {finding title}** — {domain}
`{file}` ({N}곳)
{description}
> **Fix**: {suggestion}
---
**C2. {finding title}** — {domain}
`{file}` ({N}곳)
{description}
> **Fix**: {suggestion}
────────────────────
### < 🟡 Warning ({n}) >
---
**W1. {finding title}** — {domain}
`{file}` ({N}곳)
{description}
> **Fix**: {suggestion}
---
**W2. {finding title}** — {domain}
`{file}` ({N}곳)
{description}
> **Fix**: {suggestion}
────────────────────
### < 🟢 Info ({n}) >
---
**I1. {finding title}** — {domain}
`{file}` ({N}곳)
{description}
> **Recommendation**: {Accept | Monitor | Won't Fix} — {reason}
---
**I2. {finding title}** — {domain}
`{file}` ({N}곳)
{description}
> **Recommendation**: {Accept | Monitor | Won't Fix} — {reason}
GitHub Format: Review Summary
Posted as the pull request review body. Contains the severity counts, key changes, and any findings that could not be mapped to diff lines (unmapped findings).
## Code Review: {target}
**Domains**: {activated domains joined by " • "}{if Codex enabled: " · **Codex 🤖**"}
**Findings**: {severity counts joined by " · ", e.g., "🔴 1 Critical · 🟡 2 Warning · 🟢 1 Info"}{if unmapped > 0: " ({mapped count} inline, {unmapped count} general)"}
### Summary
{1-2 sentence overview of the PR's purpose and the review's key judgment — e.g., "Codex 통합을 위한 Step 2.5 추가 및 companion runtime 연동. 전반적으로 하위호환성이 잘 유지되나, 스킬 가용성 검증에 보완이 필요하다."}
### Key Changes
- {개념적 변경 1: 무엇을 왜}
- {개념적 변경 2}
- {개념적 변경 3}
{if -g flag set AND relationships found:}
### Change Graph
```mermaid
flowchart LR
subgraph {module_name}[{Module Display Name}]
{file_node_id}[{filename}]
end
{source_node} -->|{relationship_label}| {target_node}
{end if}
{if unmapped findings exist:}
📋 General Findings
{for each unmapped finding, sorted by severity:}
{severity_icon} {finding title} — {domain}
{file} ({N}곳)
{description}
{if severity is Critical or Warning:}
Fix: {suggestion} {else (Info):} Recommendation: {Accept | Monitor | Won't Fix} — {reason} {end if}
{end for}
Generated with Claude Code
- **Summary**: 1-2문장으로 PR 목적과 리뷰 핵심 판단을 기술. PR description의 요약이 아닌 리뷰어 관점의 평가.
- **Key Changes**: 개념적 변경사항 bullet. 각 bullet은 하나의 논리적 변경 단위를 기술. 개수 제한 없이, PR이 수행한 변경만큼 기재.
- **Findings count**: unmapped가 0이면 total count만 표시. unmapped가 1 이상이면 `(N inline, M general)` breakdown 추가.
If there are zero findings overall, post: `## Code Review: {target}\n\n✅ No issues found.\n\nGenerated with [Claude Code](https://claude.com/claude-code)`
#### GitHub Format: Inline Comment
Posted as individual review comments, each attached to a specific file and line in the diff. One finding per comment.
````markdown
{severity_icon} **{prefix}{n}. {finding title}** — {domain}
{description}
{if severity is Critical or Warning:}
{if concrete code change:}
```suggestion
{suggested code — only the replacement lines, matching GitHub suggestion block format}
{else:}
Fix: {suggestion} {end if} {else (Info):} Recommendation: {Accept | Monitor | Won't Fix} — {reason} {end if}
Numbering uses the same scheme as Terminal format: `C{n}` / `W{n}` / `I{n}`, starting from 1 per severity. Terminal preview and GitHub inline comments share the same numbers for cross-reference.
For contextual match findings (mapped via contextual fallback):
````markdown
{severity_icon} **{prefix}{n}. {finding title}** — {domain}
📍 이 변경의 영향: `{affected_file}` ({N}곳)
{description}
{if severity is Critical or Warning:}
{if concrete code change:}
```suggestion
{suggested code — only the replacement lines, matching GitHub suggestion block format}
```
{else:}
> **Fix**: {suggestion}
{end if}
{else (Info):}
> **Recommendation**: {Accept | Monitor | Won't Fix} — {reason}
{end if}
Use GitHub suggestion blocks when the fix is a concrete, localized code change (variable rename, parameter addition, line replacement). The suggestion block enables one-click "Apply suggestion" in GitHub's UI. Use > **Fix**: ... for directional or structural suggestions that span multiple locations. Info findings always use > **Recommendation**: ... — suggestion blocks are not applicable.
Formatting Rules
Common rules (both formats):
- Location: file path on its own line, occurrence count only (e.g., "(3곳)", "(1곳)"). No line numbers (
L42,L58등) anywhere — descriptions and Fix suggestions reference locations by section heading, function name, or searchable code pattern instead. - Finding title must NOT repeat the file name (location is already on its own line).
- Omit severity sections that have 0 findings.
- Bullet management: If a single finding has more than 5 sub-points, consolidate.
- Commit SHA references: Never use backticks around SHAs. Use plain text or markdown links.
Graph rules (when -g flag is set):
- Mermaid direction:
flowchart LR(left-to-right) for readability. - Node labels: filename only (no full path). Use
[filename.ext]format (basename + extension). - Edge labels: relationship type — code-level (
imports,calls,extends,emits/consumes,reads/writes) or conceptual (references,shared logic,configures). - Module grouping: When changed files > 7, group by parent directory using
subgraph. When ≤ 7, show individual file nodes without subgraph. - Terminal: One-line summary only — module count, file count, primary flow path (longest chain of connected nodes, max 4 nodes joined by
→). - GitHub: Full Mermaid code block placed after Key Changes, before General Findings.
Terminal-specific rules:
- No
<details>or HTML tags — they don't render in CLI. - Summary line with severity icons:
Findings: 🔴 1 critical · 🟡 3 warnings · 🟢 1 info. - Severity headers use
### < {icon} {Severity} ({n}) >format with< >brackets. - Severity sections are separated by
────────────────────(Unicode thin box drawing × 20). - Severity header is followed by
---before the first finding. Findings within the same severity are also separated by---. - Each finding is numbered with a severity prefix:
C{n}(Critical),W{n}(Warning),I{n}(Info), starting from 1 per severity. - Finding title first (bold — renders bright in CLI), file path second.
- Action line by severity: Critical/Warning use
> **Fix**: ...(natural language, referencing by section/pattern). Info uses> **Recommendation**: {Accept | Monitor | Won't Fix} — {reason}to convey whether action is needed. - Domains with no findings: omit entirely (no "✅ ... No issues found" line).
- Zero findings overall: display
## Code Review: {target}\n\n✅ No issues found.— severity sections, summary line 모두 생략. - Codex failure notice (if applicable): append after the summary line per the error classification in Step 3 Codex Failure Handling (
⚠️for auth errors,ℹ️for other failures). Only shown when Codex mode was NOT disabled but companion returned non-zero exit. Do NOT include in GitHub format.
GitHub-specific rules (inline review comments):
- Review Summary: severity counts and key changes at the top. Unmapped findings (after contextual mapping) included below under "General Findings" as fallback only.
- Inline Comment: one finding per comment. No
<details>tags — each comment is self-contained. - Action line format in inline comments:
- Critical/Warning with concrete, localized code change → GitHub
suggestionblock (enables one-click apply). - Critical/Warning with directional or multi-location suggestion →
> **Fix**: ...natural language. - Info →
> **Recommendation**: {Accept | Monitor | Won't Fix} — {reason}(no suggestion block).
- Critical/Warning with concrete, localized code change → GitHub
- Domains with no findings: omit entirely from both summary and inline comments.
Step 6: Publisher
Based on the mode and flags, publish the review output.
Quick mode: No changes to publishing behavior. The same Working Dir / PR mode flow applies.
Working Dir / Path Mode
Display the review output using Terminal format directly to the user. No GitHub publishing.
PR Review Mode
PR mode uses a two-phase flow: preview in terminal, then publish to GitHub.
Phase 1: Preview (Terminal format)
Generate the review using Terminal format and display it to the user in the conversation.
| Condition | Next step |
|---|---|
-y or -f flag present |
Skip preview, go directly to Phase 2 |
$ARGUMENTS contains publish intent ("comment 달아", "바로 올려", "게시해", "post it") |
Skip preview, go directly to Phase 2 |
| Default | STOP here. Show the Terminal format output and ask: "PR에 게시할까요?" Do NOT proceed until the user approves. |
Phase 2: Publish (Inline Review)
After approval (or auto-publish flag), publish findings as inline review comments via the GitHub Pull Request Review API. Each finding becomes an individual comment attached to the relevant file and line in the diff, enabling per-finding resolution and reply through GitHub's native UI.
1. Resolve line targets
For each confirmed finding, verify its primary line number exists in the PR diff:
gh pr diff {number}
For each confirmed finding, resolve its target line in the PR diff using a three-tier strategy:
-
Exact match: The finding's file + line appears in a diff hunk (added line
+, or context line on the RIGHT side). Include as an inline comment at that location. -
Contextual match: Exact match failed. Search the diff for the root cause change that triggered this finding:
- Rename not propagated → map to the diff line where the rename occurred
- Missing entry that should accompany new entries → map to the nearest new entry line
- Dead code from a removal → map to a nearby RIGHT-side context or addition line adjacent to the removal
Include as an inline comment at the contextual location (must be a RIGHT-side line). Prepend to the comment body:
📍 이 변경의 영향: \{affected_file}` ({N}곳)` followed by the original finding description.
-
Unmapped: Neither exact nor contextual match is possible (e.g., finding references a file/pattern with no related changes in the diff). Include in the review body under "📋 General Findings" as a last resort.
2. Build review payload
Construct a JSON payload for the Review API:
{
"commit_id": "{head_sha}",
"event": "COMMENT",
"body": "{Review Summary template}",
"comments": [
{
"path": "{file}",
"line": {end_line_number},
"side": "RIGHT",
"start_line": {start_line_number},
"start_side": "RIGHT",
"body": "{Inline Comment template}"
}
]
}
commit_id: Obtain fromgh pr view {number} --json headRefOid --jq '.headRefOid'body: Review Summary template (severity counts + key changes + unmapped findings if any + footer)comments: Array of mapped findings, each using the Inline Comment template- For single-line findings, omit
start_lineandstart_side— onlylineandsideare needed. - Serialize the JSON payload using
jq -nor write to a temp file — do NOT manually escape strings in a heredoc. Comment bodies contain multi-line markdown, code fences, andsuggestionblocks that break raw string interpolation.
If there are no mapped findings (all unmapped), the comments array is empty. The review body contains all findings.
3. Submit
jq -n \
--arg commit_id "$HEAD_SHA" \
--arg body "$REVIEW_BODY" \
--argjson comments "$COMMENTS_JSON" \
'{commit_id: $commit_id, event: "COMMENT", body: $body, comments: $comments}' \
| gh api repos/{owner}/{repo}/pulls/{number}/reviews --method POST --input -
Where {owner}/{repo} is obtained from gh repo view --json nameWithOwner --jq '.nameWithOwner'.
4. Fallback
If the Review API call fails (e.g., 422 due to invalid line mapping):
- Move all inline comments to the Review Summary body as General Findings, then resubmit with an empty
commentsarray. GitHub's 422 does not specify which comment failed, so individual retry is not possible. - If the retry also fails or the API is unavailable, fall back to posting the full review as a single PR comment:
gh pr comment {number} --body "$(cat <<'EOF' {Review Summary with ALL findings included in body} EOF )"
Task
- Parse
$ARGUMENTSto determine mode (PR / Working Dir / Path) and flags (including Codex flags and--quick). - Context Builder: Gather diff, commit history, related files, and PR description (if applicable).
- Domain Router: Analyze changed file types and activate relevant domains. Respect
--domainoverride. If--quick, cap to 2 domains by priority. - Codex Detection: If
--quick, skip (Codex disabled). Otherwise, resolve companion path viafind, and determine Codex mode (adversarial / review / both / disabled). - Domain Agents + Codex: If
--quick, perform single-pass analysis in main context (no agent spawn, all severities). Otherwise, launch activated domain agents in parallel. If Codex is enabled, launch Codex agent(s) via companion runtime concurrently. Collect all findings. If Codex fails (non-zero exit), proceed with domain findings only. - Cross-Validation: If
--quick, lightweight validation (context check + sanity only). Otherwise, verify each finding (domain + Codex) against expanded context, git history, comments, and PR intent. Classify as Confirmed / Demoted / Dismissed. - Output Generator: Produce severity-first structured output with source tags. If
--quick, show Critical/Warning only; if none, fallback to Info; graph always omitted. - Publisher:
- Working Dir / Path mode: Display Terminal format output directly. Done.
- PR mode without
-y/-f: Show Terminal format preview → ask "PR에 게시할까요?" → resolve line targets → build inline review payload → publish via Review API ONLY if approved. - PR mode with
-y/-f: Resolve line targets → build inline review payload → publish via Review API immediately.
Important:
- Do NOT publish review comments to GitHub without explicit user approval. The only exceptions are the
-y/-fflag or explicit publish intent in$ARGUMENTS. A PR number alone is NOT publish intent. - Do NOT replace linter checks — focus on semantic, architectural, and logic issues.
- Do NOT suggest auto-fix or auto-merge — this is review only.
- This skill is independent from
review-reply.code-reviewgenerates reviews (proactive);review-replyresponds to received reviews (reactive). - Assignee: If creating GitHub PRs or issues, always include
--assignee @me. - Commit references: Never wrap commit SHAs in backticks. Use plain text or explicit markdown links.
- Adapt output language to the project's CLAUDE.md language setting.
- When Agent tool is unavailable, perform all analyses sequentially as a single-pass fallback.