codex-review
Codex Code Review Skill
Core Concept: Intention vs Implementation
Running codex review --uncommitted alone only shows AI "what was done (Implementation)".
Recording intention first tells AI "what you wanted to do (Intention)".
"Code changes + intention description" as combined input is the most effective way to improve AI code review quality.
Skill Architecture
This skill operates in two phases:
- Preparation Phase (current context): Check working directory, update CHANGELOG
- Review Phase (isolated context): Invoke Task tool to execute Lint + codex review (using context: fork to reduce context waste)
Execution Steps
0. [First] Check Working Directory Status
git diff --name-only && git status --short
Decide review mode based on output:
- Has uncommitted changes → Continue with steps 1-4 (normal flow)
- Clean working directory → Directly invoke codex-runner:
codex review --commit HEAD
1. [Mandatory] Check if CHANGELOG is Updated
Before any review, must check if CHANGELOG.md contains description of current changes.
# Check if CHANGELOG.md is in uncommitted changes
git diff --name-only | grep -E "(CHANGELOG|changelog)"
If CHANGELOG is not updated, you must automatically perform the following (don't ask user to do it manually):
- Analyze changes: Run
git diff --statandgit diffto get complete changes - Auto-generate CHANGELOG entry: Generate compliant entry based on code changes
- Write to CHANGELOG.md: Use Edit tool to insert entry at top of
[Unreleased]section - Continue review flow: Immediately proceed to next steps after CHANGELOG update
Auto-generated CHANGELOG entry format:
## [Unreleased]
### Added / Changed / Fixed
- Feature description: what problem was solved or what functionality was implemented
- Affected files: main modified files/modules
Example - Auto-generation Flow:
1. Detected CHANGELOG not updated
2. Run git diff --stat, found handlers/responses.go modified (+88 lines)
3. Run git diff to analyze details: added CompactHandler function
4. Auto-generate entry:
### Added
- Added `/v1/responses/compact` endpoint for conversation context compression
- Supports multi-channel failover and request body size limits
5. Use Edit tool to write to CHANGELOG.md
6. Continue with lint and codex review
2. [Critical] Stage All New Files
Before invoking codex review, must add all new files (untracked files) to git staging area, otherwise codex will report P1 error.
# Check for new files
git status --short | grep "^??"
If there are new files, automatically execute:
# Safely stage all new files (handles empty list and special filenames)
git ls-files --others --exclude-standard -z | while IFS= read -r -d '' f; do git add -- "$f"; done
Explanation:
-zuses null character to separate filenames, correctly handles filenames with spaces/newlineswhile IFS= read -r -d ''reads filenames one by onegit add -- "$f"uses--separator, correctly handles filenames starting with-- When no new files exist, loop body doesn't execute, safely skipped
- This won't stage modified files, only handles new files
- codex needs files to be tracked by git for proper review
3. Evaluate Task Difficulty and Invoke codex-runner
Count change scale:
# Count number of changed files and lines of code
git diff --stat | tail -1
Difficulty Assessment Criteria:
Model + Reasoning Effort Combinations:
| Combination | Quality | Time | Timeout | Recommended For |
|---|---|---|---|---|
model=gpt-5.2 model_reasoning_effort=xhigh |
Best | ~15-20 min | 40 min | Critical code, architecture changes |
model=gpt-5.3-codex model_reasoning_effort=xhigh |
High | ~8-9 min | 15 min | Difficult tasks (default) |
model=gpt-5.2 model_reasoning_effort=high |
High | ~8-9 min | 15 min | Alternative for difficult tasks |
model=gpt-5.3-codex model_reasoning_effort=high |
Good | ~5-6 min | 10 min | Normal tasks (default) |
Critical Tasks (meets any condition, use best quality model):
- Modified files ≥ 30
- Total code changes (insertions + deletions) ≥ 2000 lines
- Involves core architecture/algorithm changes (user explicitly mentioned)
- Config:
--config model=gpt-5.2 --config model_reasoning_effort=xhigh, timeout 40 minutes
Difficult Tasks (meets any condition):
- Modified files ≥ 10
- Total code changes (insertions + deletions) ≥ 500 lines
- Single metric: insertions ≥ 300 lines OR deletions ≥ 300 lines
- Cross-module refactoring
- Default config:
--config model=gpt-5.3-codex --config model_reasoning_effort=xhigh, timeout 15 minutes
Normal Tasks (other cases):
- Default config:
--config model=gpt-5.3-codex --config model_reasoning_effort=high, timeout 10 minutes
Evaluation Method:
You MUST parse the git diff --stat output correctly to determine difficulty:
# Get the summary line (last line of git diff --stat)
git diff --stat | tail -1
# Example outputs:
# "20 files changed, 342 insertions(+), 985 deletions(-)"
# "1 file changed, 50 insertions(+)" # No deletions
# "3 files changed, 120 deletions(-)" # No insertions
Parsing Rules:
- Extract file count from "X file(s) changed" (handle both "1 file" and "N files")
- Extract insertions from "Y insertion(s)(+)" if present (handle both "1 insertion" and "N insertions"), otherwise 0
- Extract deletions from "Z deletion(s)(-)" if present (handle both "1 deletion" and "N deletions"), otherwise 0
- Calculate total changes = insertions + deletions
Important Edge Cases:
- Single file:
"1 file changed"(singular form) - No insertions: Git omits
"insertions(+)"entirely → treat as 0 - No deletions: Git omits
"deletions(-)"entirely → treat as 0 - Pure rename: May show
"0 insertions(+), 0 deletions(-)"or omit both
Decision Logic (check in order, first match wins):
- IF file_count >= 30 OR total_changes >= 2000 → Critical (gpt-5.2 + xhigh)
- IF file_count >= 10 → Difficult (gpt-5.3-codex + xhigh)
- IF total_changes >= 500 → Difficult (gpt-5.3-codex + xhigh)
- IF insertions >= 300 OR deletions >= 300 → Difficult (gpt-5.3-codex + xhigh)
- ELSE → Normal (gpt-5.3-codex + high)
Example Cases:
- ⭐ "50 files changed, 2000 insertions(+), 1500 deletions(-)" → 关键任务,使用
model=gpt-5.2 model_reasoning_effort=xhigh,超时 40 分钟(核心架构变更) - ✅ "20 files changed, 342 insertions(+), 985 deletions(-)" → 困难任务,使用
model=gpt-5.3-codex model_reasoning_effort=xhigh,超时 15 分钟 - ✅ "5 files changed, 600 insertions(+), 50 deletions(-)" → 困难任务,使用
model=gpt-5.3-codex model_reasoning_effort=xhigh,超时 15 分钟 - ❌ "3 files changed, 150 insertions(+), 80 deletions(-)" → 普通任务,使用
model=gpt-5.3-codex model_reasoning_effort=high,超时 10 分钟 - ❌ "1 file changed, 50 insertions(+)" → 普通任务,使用
model=gpt-5.3-codex model_reasoning_effort=high,超时 10 分钟
Invoke codex-runner Subtask:
Use Task tool to invoke codex-runner, passing complete command (including Lint + codex review):
Task parameters:
- subagent_type: Bash
- description: "Execute Lint and codex review"
- timeout: 900000 (15 minutes for difficult tasks) or 600000 (10 minutes for normal tasks)
- prompt: Choose corresponding command based on project type and difficulty
Go project - Difficult task:
go fmt ./... && go vet ./... && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=xhigh
(timeout: 900000)
Go project - Normal task:
go fmt ./... && go vet ./... && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=high
(timeout: 600000)
Node project - Difficult task:
npm run lint:fix && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=xhigh
(timeout: 900000)
Node project - Normal task:
npm run lint:fix && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=high
(timeout: 600000)
Python project - Difficult task:
black . && ruff check --fix . && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=xhigh
(timeout: 900000)
Python project - Normal task:
black . && ruff check --fix . && codex review --uncommitted --config model=gpt-5.3-codex --config model_reasoning_effort=high
(timeout: 600000)
Clean working directory:
codex review --commit HEAD --config model=gpt-5.3-codex --config model_reasoning_effort=high
(timeout: 600000)
4. Self-Correction
If Codex finds Changelog description inconsistent with code logic:
- Code error → Fix code
- Description inaccurate → Update Changelog
Complete Review Protocol
- [GATE] Check CHANGELOG - Auto-generate and write if not updated (leverage current context to understand change intention)
- [PREPARE] Stage Untracked Files - Add all new files to git staging area (avoid codex P1 error)
- [EXEC] Task → Lint + codex review - Invoke Task tool to execute Lint and codex (isolated context, reduce waste)
- [FIX] Self-Correction - Fix code or update description when intention ≠ implementation
Codex Review Command Reference
Basic Syntax
codex review [OPTIONS] [PROMPT]
Note: [PROMPT] parameter cannot be used with --uncommitted, --base, or --commit.
Common Options
| Option | Description | Example |
|---|---|---|
--uncommitted |
Review all uncommitted changes in working directory (staged + unstaged + untracked) | codex review --uncommitted |
--base <BRANCH> |
Review changes relative to specified base branch | codex review --base main |
--commit <SHA> |
Review changes introduced by specified commit | codex review --commit HEAD |
--title <TITLE> |
Optional commit title, displayed in review summary | codex review --uncommitted --title "feat: add JSON parser" |
-c, --config <key=value> |
Override configuration values | codex review --uncommitted -c model="o3" |
Usage Examples
# 1. Review all uncommitted changes (most common)
codex review --uncommitted
# 2. Review latest commit
codex review --commit HEAD
# 3. Review specific commit
codex review --commit abc1234
# 4. Review all changes in current branch relative to main
codex review --base main
# 5. Review changes in current branch relative to develop
codex review --base develop
# 6. Review with title (title shown in review summary)
codex review --uncommitted --title "fix: resolve JSON parsing errors"
# 7. Review using specific model
codex review --uncommitted -c model="o3"
Important Limitations
--uncommitted,--base,--commitare mutually exclusive, cannot be used together[PROMPT]parameter is mutually exclusive with the above three options- Must be executed in a git repository directory
Important Notes
- Ensure execution in git repository directory
- Timeout automatically adjusted based on task difficulty:
- Difficult tasks: 15 minutes (
timeout: 900000) - Normal tasks: 10 minutes (
timeout: 600000)
- Difficult tasks: 15 minutes (
- codex command must be properly configured and logged in
- codex automatically processes in batches for large changes
- CHANGELOG.md must be in uncommitted changes, otherwise Codex cannot see intention description
Design Rationale
Why separate contexts?
- CHANGELOG update needs current context: Understanding user's previous conversation and task intention to generate accurate change description
- Codex review doesn't need conversation history: Only needs code changes and CHANGELOG, more efficient to run independently
- Reduce token consumption: codex review as independent subtask, doesn't carry irrelevant conversation context
More from escapewu/skills
project-analysis
深度项目分析工具。用于在现有 docs 不足、代码链路复杂、需要梳理系统架构、模块数据流、时序或性能风险时进行只读取证和结构化分析。常与 `project-docs-workflow` 配套使用,作为其升级步骤;也可在用户明确要求架构分析、数据流分析、时序图、调用链梳理或性能排查时直接使用。默认应落文档:优先新建或更新 `docs/` 下合适文档,不再停留在仅终端输出的 analysis-only 模式。
19pdf
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
1xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
1canvas-design
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.
1webapp-testing
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
1mcp-builder
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
1