review
Review
Unified review skill for agent-performed reviews. Auto-detects content types in a change set, applies the appropriate review dimensions, and delivers severity-graded findings with actionable fixes.
When to Activate
Invocation posture: hybrid — auto-trigger only on unmistakable, artifact-scoped review requests. For ambiguous requests, explicit invocation is preferred.
Activate when the user:
- Asks to review a PR, diff, branch, commit, staged changes, or specific files
- Uses: "code review", "CR", "LGTM?", "security review", "find bugs in", "review my changes", "check my PR"
- Names a concrete artifact and asks for bug finding, breaking-change checks, migration risk review, API/doc correctness, or config/infrastructure review
- Pastes code/text and asks for a review (not a discussion)
When Not To Activate
Do not activate for:
- Open-ended design or architecture discussions ("what do you think about this approach?")
- Brainstorming, option comparison, or solution ideation
- Governance or library-audit requests about skills, agents, or repository hygiene
- Requests to teach review culture, reviewer communication, or mentoring practices
- General quality feedback without a concrete artifact or change set to inspect
Use brainstorming, skill-lifecycle-manager, or a general design discussion instead when the user wants exploration rather than a structured review protocol.
Review Modes
| Mode | When | Behavior |
|---|---|---|
| Quick | ≤3 files or "quick look" | P0/P1 only. Inline findings, no template overhead. |
| Standard | 4–15 files (default) | Full analysis. Structured output with severity grading. |
| Deep | >15 files, "thorough", or high-risk | Full analysis + impact analysis + cross-file tracing. |
Risk-based override: File count is the default heuristic, but content risk takes precedence. Even with ≤3 files, upgrade to Standard if the change touches auth/authz, payment/billing, database schemas, or public API contracts. Conversely, user intent always wins — "quick look" means Quick, "thorough" means Deep, regardless of file count.
Phase 1 — Scope
1a. Gather context
Review target — determine from user's request:
- PR:
gh pr diff <number>orgit diff <base>...<head>, plus PR description and linked issues - Branch:
git log --oneline main..<branch>+git diff main...<branch> - Staged:
git diff --cached| Unstaged:git diff - Specific files: read directly | Pasted content: use as-is
If diff >2000 lines, chunk by directory/feature and inform the user.
1b. Read project conventions
Before analyzing, check for project-specific context that shapes what "good" looks like. Many agents auto-load project instruction files into context — check what's already available before reading files redundantly.
- Project instruction files (
CLAUDE.md,.cursorrules,AGENTS.md,copilot-instructions.md, or similar) — may already be in context - Linter/formatter configs (
.eslintrc,.prettierrc,pyproject.toml) — if these exist, don't flag style issues they cover - PR template or contributing guide — check if the PR follows the project's expected format
- Recent commit messages — understand the project's conventions
This context prevents generic feedback that ignores team decisions (e.g., flagging ORM usage when the project explicitly avoids ORMs).
1c. Understand intent
Summarize the change in one sentence before diving in. If intent is unclear from the diff and PR description, ask the user before proceeding. Reviewing without understanding intent produces noise.
1d. Classify content types
Each changed file maps to one or more content types. The Dimensions column below shows key dimensions for each type; the reference file contains the full checklist — apply all dimensions listed there, not just the ones summarized here. When a file matches multiple types, use the most specific match.
| Priority | Content type | Detected by | Dimensions | Reference |
|---|---|---|---|---|
| 1 | API Spec | openapi.*, swagger.*, *.graphql, *.proto |
Compat, Naming, Versioning | references/content-checklist.md |
| 2 | Database | *.sql, migrations/, schema/ |
Safety, Indexes, Rollback | references/code-checklist.md |
| 3 | Infrastructure | Dockerfile, *.tf, k8s/, CI YAML |
Secrets, Limits, Idempotency | references/code-checklist.md |
| 4 | Tests | *.test.*, *_test.*, __tests__/ |
Coverage, Isolation, Quality | references/code-checklist.md |
| 5 | Design Doc | *prd*, *design*, *rfc*, *adr* |
Feasibility, Completeness, Trade-offs | references/content-checklist.md |
| 6 | Documentation | *.md, docs/, README* |
Accuracy, Completeness, Clarity | references/content-checklist.md |
| 7 | Frontend | *.tsx, *.vue, *.css, components/ |
Rendering, A11y, State, Bundle | references/code-checklist.md |
| 8 | Configuration | *.yaml, *.json, *.toml, *.env* |
Secrets, Correctness | references/content-checklist.md |
| 9 | Code (default) | *.py, *.ts, *.go, *.java, src/ |
Security, Perf, Correctness, Design | references/code-checklist.md |
Priority resolves overlaps: a *.tsx file in __tests__/ is Tests (priority 4), not Frontend (7). A config.yaml in k8s/ is Infrastructure (3), not Configuration (8).
For the detailed checklist of each content type, read the relevant reference file.
Phase 2 — Analyze
For each detected content type, read the corresponding reference checklist and apply its dimensions. Two principles guide the analysis:
Scope to changes only. Read surrounding code for context, but only comment on what was added or modified. A pre-existing bug in untouched code is not a finding (unless the change makes it worse).
Look beyond code patterns. Checklists catch code-level issues (injection, duplication, naming) but can miss behavioral changes — places where the system now does something different. For refactoring PRs especially, ask: "Does the new code behave identically to the old code in all cases?" See the Behavioral Change Analysis section below.
Prioritize by impact, not by checklist order. Security vulnerabilities and correctness bugs matter more than style preferences. Don't spend equal time on every dimension — go deep on what matters for this specific change.
Calibrate severity carefully. A common mistake is marking maintainability issues as P3/Nit when they are actually P2. Ask: "Will this cause a real problem if left unfixed?" Duplicated logic that will silently diverge on the next edit is P2, not P3. A pure style preference with no consequence is P3.
Cross-cutting concerns
Beyond the per-type checklists, always check these regardless of content type:
- Code-doc consistency: If code and docs changed in the same PR, do the docs reflect the new behavior?
- Missing companions: New API endpoint with no tests? New feature with no docs? Schema change with no migration?
- Changelog: User-facing changes should have a changelog entry (if the project uses one).
Non-findings
Do not spend review budget on noise:
- Formatting, import ordering, or linter-only issues already covered by project automation
- Pure style preferences with no behavioral, maintainability, or policy consequence
- Untouched pre-existing issues unless the current change makes them worse
- Alternate designs that do not identify a concrete risk in the submitted change
Behavioral change analysis
Checklist-driven code review tends to focus on code-level patterns (injection, N+1, naming) and can miss behavioral changes — places where the system does something different than before, even if the code looks clean. This matters most in refactoring PRs where the intent is "same behavior, better structure." For any PR that modifies or replaces existing logic, explicitly check:
- State model changes: Did enums gain/lose variants? Did a 3-state machine become 2-state? This changes what the system can express.
- Error handling changes: Did error paths change from silent to throwing, or vice versa? Did catch-all branches (
_ => ...,default:) change what they swallow? - Default value changes: Did a field go from required to optional (or
StringtoOption<String>)? Downstream consumers may break. - Timing/ordering changes: Did synchronous calls become async? Did fire-and-forget become blocking (or vice versa)? These change backpressure and failure modes.
- API contract changes: Were CLI flags, environment variables, response shapes, or event types removed or renamed? These are breaking changes even if the code compiles.
- Scope narrowing: Did a function that previously handled N cases now only handle a subset? The dropped cases may fail silently.
When you find a behavioral change, assess whether it is intentional (documented in PR description) or accidental (a side effect of refactoring). Accidental behavioral changes are typically P1.
Removal inventory (Deep mode)
For large refactoring PRs with significant deletions, briefly inventory what was removed and confirm clean removal:
- Are all references to deleted types/functions/modules also removed?
- Are there orphaned imports, dead config entries, or stale test helpers?
- Is the removal documented in the PR description or changelog?
This catches partial removals where a type is deleted but a consumer still references it at runtime via a string key or dynamic dispatch.
Open questions
A good review doesn't just find problems — it also surfaces things the reviewer can't determine from the diff alone. After analyzing, note questions where the answer would change your assessment:
- Implementation details outside the diff that affect correctness (e.g., "The
with_command_metamethod isn't in this diff — does it already exist on main?") - Design intent that isn't documented (e.g., "Is the
--applyflag incommand_pathintentional per the envelope spec, or should it be normalized toapply?") - Missing context about contracts, consumers, or deployment (e.g., "Are there schema-level contract tests validating these envelope fields?")
Use a question only when the missing answer would materially change severity, correctness, or whether something is a finding at all. Do not soften a clear bug, contract break, or migration risk into a question. These go in a dedicated "Questions" section in the output.
Impact analysis (Deep mode only)
For changes to exported functions, public APIs, schemas, or shared interfaces:
- Search the codebase for all callers/consumers of the changed interface
- Identify if any consumer would break or behave differently
- Check if migration scripts are needed and included
Phase 3 — Synthesize
Severity levels
| Level | Label | Merge gate | When to use |
|---|---|---|---|
| P0 | 🔴 Critical | Block | Security vuln, data loss, crash, factual error in docs causing harm |
| P1 | 🟠 High | Should fix | Correctness bug, breaking API change, missing rollback, misleading docs |
| P2 | 🟡 Medium | Recommended | Design issue, missing tests, incomplete docs, unclear PRD; DRY violations with >2 copies, inconsistent conventions that will confuse consumers |
| P3 | 🟢 Nit | Optional | Style preference, minor wording that doesn't cause confusion |
| — | 💡 Suggestion | — | Alternative approach, learning opportunity |
| — | 🎉 Praise | — | Good pattern, clean code, thorough docs |
Finding protocol
Every P0, P1, and P2 finding must include:
- Issue: what is wrong
- Consequence: why it matters
- Evidence: file path, line, schema field, config key, or diff hunk
- Fix direction: the expected repair direction, not just "please fix"
Questions supplement findings; they do not replace clear findings. Do not report formatter noise, import ordering, or lint-only issues unless they reveal a real behavioral or policy risk.
Output by mode
Quick mode — concise, no template:
[verdict emoji] [verdict]. [1 sentence summary].
- 🔴 **[title]** (`file:line`): [issue + fix]
- 🟠 **[title]** (`file:line`): [issue + fix]
- ❓ [question, if any — omit if none]
Standard mode — structured:
## Review: [brief title]
### Summary
[1-2 sentences. What it does, overall assessment.]
**Verdict**: ✅ Approve / ⚠️ Approve with suggestions / 🔴 Request changes
**Risk**: Low / Medium / High | **Files**: N (+X/-Y lines)
---
### Findings
[For mixed PRs, group by content type. Within each group, order by severity.]
#### 🔴 P0 — Critical
> **[Title]** (`path/to/file:line`)
> [Why this matters — not just what's wrong, but the consequence]
> ```diff
> - problematic
> + fixed
> ```
[P1, P2, P3, Suggestions follow same format]
#### 🎉 What Looks Good
- [Acknowledge good patterns, thorough tests, clean design]
### Questions
[Things the reviewer can't determine from the diff alone — where the answer would change the assessment. Skip this section if there are no genuine open questions.]
### Quick Wins
[Top 3-5 high-impact, low-effort fixes]
If findings >15, show top 10 and offer to expand.
Deep mode — Standard format plus:
- Behavioral Changes: list each behavioral difference between old and new code, noting whether it is intentional or accidental
- Removal Inventory: confirm clean removal of deleted types/functions/modules (no orphaned references)
- Impact analysis section with consumer list
- Summary table:
| Content Type | P0 | P1 | P2 | P3 | - Escalation flags (see below)
Verdict logic
- ✅ Approve: No P0, no P1
- ⚠️ Approve with suggestions: No P0; exactly 1 P1 where the author is aware and has a clear fix path
- 🔴 Request changes: Any P0, or 2+ P1s, or any P1 without a clear fix
Phase 4 — Act
Review before repair. Present findings first. Only apply changes when the user explicitly asks for fixes or chooses one of the next-step options below.
After presenting the review, offer next steps based on content type:
Code / Tests / Frontend / Infrastructure / Configuration:
- 🔧 Fix all — 2. 🔴 Fix critical only — 3. 🎯 Fix specific — 4. ⏭️ Skip
Apply code edits directly. Explain behavioral impact before applying changes.
Documentation:
- ✏️ Rewrite flagged sections — 2. 🎯 Rewrite specific — 3. ⏭️ Skip
Rewrite the relevant sections inline.
PRD / Design Doc / API Spec:
- 💡 Provide suggested rewrites — 2. ⏭️ Skip
These content types reflect design decisions that belong to the author. Provide concrete rewrite suggestions and explain the reasoning, but don't apply changes directly — the author decides what to adopt.
Tone
Review is an execution protocol, not a conversation style guide:
- Be direct, evidence-based, and severity-calibrated.
- Explain consequences, not just rules. "This is vulnerable to injection because..." not just "use parameterized queries."
- Use questions only when missing context blocks confidence. Do not turn a clear correctness or security issue into a suggestion.
- Keep praise brief and selective. Use it only when it helps preserve a strong pattern worth keeping.
- Separate opinion from requirement. Style preferences are P3 at most, and often omitted if automation already covers them.
- Show, don't just tell. Every non-trivial finding gets a concrete before/after — diff for code, rewritten text for docs.
Escalation Triggers
Flag for senior review instead of resolving yourself:
- Database schema changes, public API contract changes
- Auth/authz logic, payment/billing/PII processing
- New external dependencies (security-sensitive)
- Infrastructure changes affecting production
- Architecture decisions with long-term consequences
Validation And Evaluation
Authoring note for maintainers of the canonical skill package:
- Validate structural changes with local authoring tooling before release.
- Evaluate trigger boundaries with the canonical
eval/trigger-posture-cases.jsonsuite. - Evaluate realistic review behavior with the canonical
eval/eval-cases.jsonsuite. - Keep authoring-only assets such as
eval/out of runtime projections.
Caveats
- Reviews changes for quality — does not execute tests or run the app
- Large diffs (>50 files) split across multiple passes
- Domain detection is heuristic — tell the reviewer your stack if it gets it wrong
- Read-only by default; fixes in Phase 4 need write permission
- Assesses artifact quality and change risk, not open-ended solution ideation — that belongs in brainstorming or design discussion
More from liqiongyu/my-agents
brainstorming
Manual-first brainstorming workflow for turning ambiguous ideas or competing directions into an approved decision before planning or implementation. Activate when the user explicitly asks to brainstorm, explore options, compare approaches, or pressure-test a direction. Do not activate for clarification, review, detailed planning, or straightforward execution once a direction is already chosen.
8deep-research
Use when the user explicitly asks for comprehensive, citation-backed research such as a deep dive, due diligence, market analysis, or a multi-source comparison/report. Do not activate for quick factual lookups, ordinary coding tasks, or routine content generation unless the user first asks for research or source verification.
7project-documentation
>
6readme-craftsman
Create, update, or review a repository README when the user explicitly asks for README work. Use it to draft a new README, refresh an existing README after project changes, or audit a README against the current repository. Do not use it for general documentation, API docs, architecture docs, or documentation tasks that do not specifically target a README file.
5business-plan
Use this skill for substantial business planning work: drafting or revising a business plan, investor-ready financial analysis, market sizing, pitch deck narrative, strategic review, or business valuation. It is especially useful when the user needs structured commercial thinking, investor-grade outputs, China fundraising context, or AI/agent business analysis. Prefer this skill when the main deliverable is a business analysis or plan, not open-ended idea exploration, general current-events research, or final file production in slides, docs, or spreadsheets.
4prompt-engineering
>
4