review
Code Forge — Review
Comprehensive code review against reference documents and engineering best practices. Covers functional correctness, security, resource management, code quality, architecture, performance, testing, error handling, observability, maintainability, backward compatibility, and dependency safety.
Supports four modes:
- Feature mode: Review a single feature against its
plan.md - Project mode: Review the entire project against planning documents or upstream docs
- Feedback mode: Evaluate and respond to incoming code review comments (
--feedback) - GitHub PR mode: Post a 14-dimension review as a comment on a GitHub PR (
--github-pr)
When to Use
- Feature implementation is complete or nearly complete
- Want to verify code quality before creating a PR
- Need a structured review against the original plan or documentation
- Want a holistic project-level quality check
- Received code review feedback and need to evaluate/respond to it (
--feedback) - Want to post a code review directly to a GitHub PR for team visibility (
--github-pr)
Workflow
Config → Determine Mode → Locate Reference → Collect Scope → Multi-Dimension Review (sub-agent) → Display Report → Update State → Summary
Context Management
The review analysis is offloaded to a sub-agent to handle large diffs without exhausting the main context.
Review Severity Levels
All issues use a 4-tier severity system, ordered by merge-blocking priority:
| Severity | Symbol | Meaning | Merge Policy |
|---|---|---|---|
blocker |
:no_entry: | Production risk. Data loss, security breach, crash. | Must fix before merge |
critical |
:warning: | Significant quality/correctness concern. | Must fix before merge |
warning |
:large_orange_diamond: | Recommended fix. Could cause issues over time. | Should fix |
suggestion |
:blue_book: | Nice-to-have improvement. Can address later. | Nice-to-have |
Review Dimensions Reference
The following dimensions are used in both Feature Mode and Project Mode reviews. They are ordered by priority tier.
Tier 1 — Must-Fix Before Merge (★★★★★)
D1: Functional Correctness & Business Logic
Does the code actually implement what it should? This is the highest-priority dimension.
Check items:
- Requirements fulfillment: Does the code implement the specified behavior correctly?
- Boundary conditions: Off-by-one errors, empty collections, zero/negative values, max values, null/undefined
- Concurrency & race conditions: Shared mutable state, missing locks/synchronization, TOCTOU bugs
- Idempotency: Are operations safe to retry? Are duplicate requests handled?
- State transitions: Are all states reachable? Are invalid transitions prevented?
- Data consistency: Transactions boundaries, partial failure handling, eventual consistency gaps
- Type correctness: Type coercion surprises, implicit conversions, generic type safety
- Edge cases in business rules: Negative amounts, timezone handling, leap years, Unicode, locale-specific logic
D2: Security Vulnerabilities
Does the code introduce any security risk?
Check items:
- Input validation: All external input (HTTP params, form data, file uploads, user-provided env vars) must be validated before use — prefer schema-based validation over scattered manual checks for complex input
- Injection: SQL injection (string concatenation), command injection, LDAP injection, template injection — never concatenate strings into SQL, shell commands, or log messages; use parameterized queries and safe APIs
- XSS: Reflected, stored, DOM-based — unescaped user content in HTML/JS; dynamic frontend content must use framework-native safe rendering
- Authentication & authorization: Missing auth checks, privilege escalation, insecure session management
- Secrets management: Hardcoded credentials, API keys in code, secrets in logs,
.envcommitted — use environment variables + secret manager - CSRF / SSRF: Missing tokens, unvalidated redirect URLs, internal network access
- Deserialization: Unsafe deserialization of untrusted data (pickle, Java serialization, JSON.parse with eval)
- Cryptography: Weak algorithms (MD5/SHA1 for passwords), ECB mode, predictable random, custom crypto
- Path traversal: Unsanitized file paths from user input
- Log forging / information disclosure: Sensitive data in logs, verbose error messages to users; structured logging with request context recommended for service code
- Dependency vulnerabilities: Known CVEs in direct or transitive dependencies
D3: Resource Management & Lifecycle
Are all acquired resources properly released? This is especially critical for long-running services.
Check items:
- Event listeners:
addEventListenerwithoutremoveEventListeneron cleanup - Timers:
setInterval/setTimeoutwithoutclearInterval/clearTimeout - Subscriptions: Observables, pub/sub, WebSocket connections not unsubscribed on teardown
- File handles / DB connections: Opened but not closed, missing
finally/defer/using/with - Goroutine / thread / fiber leaks: Spawned without termination condition or cancellation
- Memory: Unbounded caches/maps, closures capturing large scopes, circular references preventing GC
- Stream / iterator: Not consumed or not closed, backpressure not handled
- Framework lifecycle: React
useEffectcleanup, AngularOnDestroy, VueonUnmounted, iOSdeinit
Tier 2 — Should-Fix (★★★★☆)
D4: Code Quality & Readability
Is the code clear, maintainable, and following project conventions?
Check items:
- Naming: Variables, functions, classes use descriptive, intention-revealing names; no vague standalone names (
data,temp,obj,item,info,val,process(),handle(),doIt(),Manager,Util,Helper) — qualified forms likeuserData,handleClick(),ConnectionManagerare fine; follow language ecosystem conventions - Magic values: No unexplained literals — use named constants
- Function length: Functions > 50 lines should be scrutinized; > 100 lines likely needs splitting (defer to project
CLAUDE.mdfor team-specific thresholds) - Side effects: I/O and state mutations should be isolated to boundaries where practical; keep core logic predictable and testable
- Control flow: Prefer guard clauses (early return) over deeply nested
if/else - DRY: No copy-pasted logic blocks; shared behavior extracted appropriately
- Dead code: No unused functions, unreachable branches, commented-out code, unused imports
- Comments quality: Present only where logic isn't self-evident (complex algorithms, performance trade-offs, business rules, counter-intuitive code); no obvious/redundant comments;
TODO/HACK/FIXMEshould include enough context to be actionable later - Code structure: Appropriate abstractions, no unnecessary complexity or premature optimization
- Consistent style: Follows project's existing patterns for formatting, file organization, module structure
D5: Architecture & Design
Does the change fit the project's architectural conventions?
Check items:
- Layer boundaries: Respects existing architectural layers (controller/service/repo, MVC, hexagonal, etc.)
- Dependency direction: No circular dependencies, lower layers don't depend on higher layers
- SOLID principles: Single responsibility, open-closed, interface segregation violations
- Coupling: New code not tightly coupled to implementation details of other modules
- Abstraction level: Not introducing a parallel system alongside an existing one
- API surface: Public interfaces are clean, minimal, consistent, and well-defined
- Module cohesion: Related functionality grouped together; no God Class / God Function
- New abstractions justified: If new patterns/frameworks/base classes are introduced, are they warranted?
D6: Performance & Efficiency
Are there obvious performance problems on hot paths?
Check items:
- N+1 queries: Database queries inside loops
- Missing indexes: Frequent queries on unindexed columns
- Unnecessary allocations: Creating objects inside tight loops, large object copies on hot paths
- Blocking in async context: Synchronous I/O in async code,
awaitin loops whenPromise.allis appropriate - Lock granularity: Oversized critical sections, lock contention on hot paths
- Cache misuse: Cache stampede / thundering herd, unbounded cache growth, no TTL
- Algorithmic complexity: O(n²) or worse where O(n log n) or O(n) is feasible
- Payload size: Fetching all columns when only a few needed, unbounded result sets, no pagination
- Frontend: Unnecessary re-renders, missing memoization, layout thrashing, large bundle imports
D7: Test Coverage & Verifiability
Are critical paths tested? Are tests meaningful?
Check items:
- Coverage of critical paths: Core business logic, state transitions, and data transformations have tests
- Happy path: Normal/expected flow is tested
- Sad path: Error conditions, invalid inputs, failure scenarios are tested
- Edge cases: Boundary values, empty inputs, concurrent access, large inputs
- Test independence: Tests don't depend on execution order or shared mutable state
- Determinism: No flaky tests relying on timing, network, or random data without seeding
- Meaningful assertions: Tests assert behavior, not implementation; not just "no error thrown"
- Test naming: Test names describe the scenario and expected behavior
- Mock appropriateness: External dependencies mocked; internal logic not over-mocked
- Missing test files: Source modules without any corresponding test coverage
Tier 3 — Recommended Fix (★★★☆☆)
D8: Error Handling & Robustness
Are errors properly caught, classified, reported, and recovered from?
Check items:
- Swallowed exceptions: Catch blocks that silently ignore errors (empty catch, catch-and-log-only for critical ops)
- Over-broad catch: Catching
Exception/Error/objectinstead of specific types - Error propagation: Errors from downstream services/APIs properly surfaced or wrapped
- User-facing errors: Error messages are user-friendly, no stack traces or internal details leaked
- Timeout handling: Network calls, DB queries, external APIs have timeouts configured
- Retry logic: Retries have backoff, jitter, and max-retry limits; not infinite retry loops
- Fallback / degradation: Critical paths have fallback behavior when dependencies fail
- Promise / async errors: Unhandled promise rejections, missing
.catch(), missing error boundaries (React)
D9: Observability (Logging / Metrics / Tracing)
Can you debug and monitor this code in production?
Check items:
- Structured logging: Key business operations emit structured logs with context (user ID, request ID, operation)
- Log levels: Appropriate use of debug/info/warn/error levels
- Error logging: Exceptions logged with stack traces and context; not swallowed silently
- Sensitive data in logs: No passwords, tokens, PII, or credit card numbers in log output
- Request tracing: Trace ID / correlation ID propagated across service boundaries
- Business metrics: Key business events have counters/gauges (orders placed, payments processed, errors)
- Health/readiness signals: Service exposes health checks if applicable
- Alertability: Can an on-call engineer understand and act on the logs/metrics this code produces?
D10: Standards & Conventions
Does the code follow team and project conventions?
Check items:
- Lint compliance: Code passes project linter configuration
- File/directory structure: Follows project's established organization patterns
- Import ordering: Follows project convention for import grouping/ordering
- Dependency management: New dependencies declared properly, version pinned, justified
- Naming conventions: Files, classes, functions follow project naming patterns (camelCase, snake_case, etc.)
- Configuration: New config via environment variables or config files, not hardcoded
- No surprise technology: New frameworks, libraries, or patterns introduced without team discussion
Tier 4 — Nice-to-Have / Track as Tech Debt (★★☆☆☆ / ★☆☆☆☆)
D11: Backward Compatibility & Ops-Friendliness
Will this change break existing consumers or complicate deployment?
Check items:
- API contract: Existing API fields/endpoints not removed or semantically changed without versioning
- Database schema: Column renames, type changes, or drops have migration + backward-compat strategy
- Configuration changes: New required config keys have defaults or migration docs
- Cache/queue keys: Key format changes won't corrupt existing cached data
- Enum/constant changes: Value semantics preserved; new values don't break existing consumers
- Rollback safety: Can this change be rolled back without data loss or corruption?
- Feature flags / gradual rollout: High-risk changes gated behind feature flags
D12: Maintainability & Tech Debt
Does this change leave the codebase better or worse?
Check items:
- Copy-paste debt: Large duplicated blocks that should be extracted
- Deep inheritance: Inheritance depth > 3 levels; prefer composition
- Magic configuration: Behavior controlled by non-obvious environment variables or config
- Over-engineering: Abstractions, extension points, or patterns for hypothetical future needs
- Under-engineering: Quick hacks that will clearly need rework soon (TODO/FIXME/HACK comments)
- Coupling to internals: Depending on internal implementation details of libraries or other modules
D13: Dependencies & Supply Chain Security
Are new or updated dependencies safe and justified?
Check items:
- Known CVEs: Dependencies scanned for known vulnerabilities
- Version pinning: Versions locked (lockfile present and updated); not using
latestor* - Minimal footprint: Not pulling in a large library for a small utility
- Maintenance status: Dependency actively maintained, not abandoned/archived
- License compatibility: License compatible with project requirements
- Transitive risk: Major transitive dependencies checked for known issues
D14: Accessibility / i18n (Frontend & Mobile Only)
Is the UI usable by all users? (Skip this dimension for backend-only projects.)
Check items:
- Semantic HTML: Proper use of heading levels, landmarks, form labels
- ARIA attributes: Interactive elements have appropriate
aria-label,role, states - Keyboard navigation: All interactive elements reachable and operable via keyboard
- Color contrast: Text meets WCAG AA contrast ratio (4.5:1 for normal text)
- Hardcoded strings: User-visible text uses i18n/l10n framework, not hardcoded
- RTL support: Layout not broken in right-to-left languages (if applicable)
- Screen reader: Dynamic content changes announced; focus management correct
Dimension Application Rules
These rules apply to both Feature Mode and Project Mode reviews:
- D1–D3 (Tier 1): Always apply. These are potential merge blockers.
- D4–D7 (Tier 2): Always apply. These are should-fix items.
- D8–D10 (Tier 3): Always apply. Flag as warnings/suggestions.
- D11–D13 (Tier 4): Always apply but expect mostly suggestions.
- D14 (Accessibility/i18n): Apply ONLY if
project_typeis"frontend"or"fullstack".
Detailed Steps
@../shared/configuration.md
Step 1: Determine Review Mode
Parse the user's arguments to determine which mode to use.
1.0a --github-pr Flag Provided
If the user passed --github-pr (e.g., /code-forge:review --github-pr or /code-forge:review --github-pr 123):
→ GitHub PR Mode — Read and follow skills/review/github-pr-workflow.md. Do NOT continue with the steps below.
1.0b --feedback Flag Provided
If the user passed --feedback (e.g., /code-forge:review --feedback or /code-forge:review --feedback #123):
→ Feedback Mode — Read and follow skills/review/feedback-workflow.md. Do NOT continue with the steps below.
1.1 Feature Name Provided
If the user provided a feature name (e.g., /code-forge:review user-auth):
→ Feature Mode — go to Step 2F
1.2 --project Flag Provided
If the user passed --project (e.g., /code-forge:review --project):
→ Project Mode with scope = "full" — go to Step 2P
1.3 No Arguments
If no arguments provided:
- Scan both
{output_dir}/*/state.jsonand.code-forge-tmp/*/state.jsonfor all features - Filter to features with at least one
"completed"task - Build choice list:
- If completed features exist: include each as an option, plus "Review entire project" as the last option
- If no completed features: go to Project Mode with
scope = "changes"automatically
- If only one option (project review): go to Project Mode with
scope = "changes"automatically - If multiple options: use
AskUserQuestionto let user select- If user selects "Review entire project": go to Project Mode with
scope = "changes"
- If user selects "Review entire project": go to Project Mode with
Step 2F: Feature Mode — Locate Feature
2F.1 Find Feature
- Look for
{output_dir}/{feature_name}/state.json - If not found, also check
.code-forge-tmp/{feature_name}/state.json - If still not found: show error, list available features
2F.2 Load Feature Context
- Read
state.json - Read
plan.md(for acceptance criteria and architecture) - Note completed task count and overall progress
→ Go to Step 3F
Step 2P: Project Mode — Locate Reference
Determine the reference level using a fallback chain.
2P.1 Check for Planning Documents (Level 1: Planning-backed)
Scan {output_dir}/*/plan.md:
- If one or more
plan.mdfiles found → planning-backed - Read all
plan.mdfiles and aggregate:- Acceptance criteria from each feature
- Architecture decisions
- Technology stack
- Read corresponding
state.jsonfiles for progress context - Record:
reference_level = "planning" - Record: list of plan file paths and aggregated criteria
- → Go to Step 3P
2P.2 Check for Documentation (Level 2: Docs-backed)
If no planning documents found, scan for upstream documentation:
Search paths (in order):
{input_dir}/*.md— feature specsdocs/directory — PRD, SRS, tech-design, test-plan files
Look for files matching patterns:
**/prd.md,**/srs.md,**/tech-design.md,**/test-plan.md**/features/*.md- Any
.mdfiles directly underdocs/
If documentation files found → docs-backed:
- Read all found docs
- Extract: requirements, architecture decisions, acceptance criteria, scope definitions
- Record:
reference_level = "docs" - Record: list of doc file paths and extracted criteria
- → Go to Step 3P
2P.3 No Reference (Level 3: Bare)
If neither planning nor docs found → bare:
- Record:
reference_level = "bare" - → Go to Step 3P
Step 3F: Feature Mode — Collect Changes and Review
3F.1 Collect Change Scope
From Commits:
Extract all commit hashes from state.json → tasks[].commits:
- Flatten all commit arrays into a single list
- If commits are recorded, use
git diffbetween the earliest and latest commits - If no commits recorded, fall back to scanning files involved in tasks
From Task Files:
Read all tasks/*.md files and collect their "Files Involved" sections:
- Build a complete list of files created/modified by this feature
- Read current state of each file
Summary:
- Total files changed
- Total lines added/removed (from git diff)
- List of all affected files
3F.2 Detect Project Type
Before launching the sub-agent, detect the project type to guide dimension selection:
- Has frontend? Check for:
*.tsx,*.jsx,*.vue,*.svelte, HTML templates, CSS/SCSS files, or frontend framework config (next.config.*,vite.config.*,angular.json) - Has backend/service? Check for: server entry points, API route definitions, database models, middleware
- Language ecosystem: Detect primary language(s) from file extensions and package manifests
Record: project_type = "frontend" | "backend" | "fullstack" | "library" | "cli" | "unknown"
3F.3 Multi-Dimension Review (via Sub-agent)
Offload to sub-agent to handle the full diff analysis.
Spawn a Task tool call with:
subagent_type:"general-purpose"description:"Review feature: {feature_name}"
Sub-agent prompt must include:
- Feature name and
plan.mdfile path - List of all affected files (sub-agent reads them)
- The acceptance criteria from
plan.md - Detected project type
- Instructions to review across all applicable dimensions below
- The severity level definitions (blocker / critical / warning / suggestion)
- Instruction: "For each issue, specify severity, file path, line number/range, what's wrong, and how to fix it. Use the Review Comment Formula: Problem → Why it matters → Suggested fix."
Review dimensions to apply: Follow Dimension Application Rules.
Additionally, always check Plan Consistency (feature mode specific):
- All acceptance criteria from
plan.mdare met - Architecture matches the design in
plan.md - No unplanned features added (scope creep)
- All planned tasks are implemented
Sub-agent must return the following structured format:
REVIEW_SUMMARY:
overall_rating: <pass | pass_with_notes | needs_changes>
total_issues: <number>
blocker_count: <number>
critical_count: <number>
warning_count: <number>
suggestion_count: <number>
merge_readiness: <ready | fix_required | rework_required>
dimensions_reviewed: <list of dimension IDs reviewed>
FUNCTIONAL_CORRECTNESS: # D1
rating: <pass | warning | critical>
issues:
- severity: <blocker | critical | warning | suggestion>
file: path/to/file.ext
line: <number or range>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
SECURITY: # D2
rating: <pass | warning | critical>
issues:
- severity: <blocker | critical | warning | suggestion>
file: path/to/file.ext
line: <number or range>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
RESOURCE_MANAGEMENT: # D3
rating: <pass | warning | critical>
issues:
- severity: <blocker | critical | warning | suggestion>
file: path/to/file.ext
line: <number or range>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
CODE_QUALITY: # D4
rating: <good | acceptable | needs_work>
issues:
- severity: <critical | warning | suggestion>
file: path/to/file.ext
line: <number or range>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
ARCHITECTURE: # D5
rating: <good | acceptable | needs_work>
issues:
- severity: <critical | warning | suggestion>
file: path/to/file.ext
line: <number or range>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
PERFORMANCE: # D6
rating: <good | acceptable | needs_work>
issues:
- severity: <critical | warning | suggestion>
file: path/to/file.ext
line: <number or range>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
TEST_COVERAGE: # D7
rating: <good | acceptable | needs_work>
coverage_gaps:
- severity: <critical | warning | suggestion>
file: path/to/source.ext
description: <what scenario is untested>
ERROR_HANDLING_AND_OBSERVABILITY: # D8 + D9
rating: <good | acceptable | needs_work>
issues:
- severity: <warning | suggestion>
file: path/to/file.ext
line: <number or range>
category: <error_handling | logging | metrics | tracing>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
MAINTAINABILITY_AND_COMPATIBILITY: # D10 + D11 + D12 + D13
rating: <good | acceptable | needs_work>
issues:
- severity: <warning | suggestion>
file: path/to/file.ext
line: <number or range>
category: <standards | backward_compat | tech_debt | dependencies>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
ACCESSIBILITY: # D14 (frontend/fullstack only)
rating: <good | acceptable | needs_work | skipped>
issues:
- severity: <warning | suggestion>
file: path/to/file.ext
line: <number or range>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
PLAN_CONSISTENCY:
criteria_met: <X/Y>
unmet_criteria:
- <criterion not met>
scope_issues:
- <unplanned additions or missing planned features>
→ Go to Step 4F
Step 3P: Project Mode — Collect Source Code and Review
The primary subject of review is the source code itself. Reference documents (plans, specs) serve only as criteria to check against — the sub-agent must deeply read and analyze the actual implementation.
3P.1 Collect Source Code
Identify and collect project source files for deep code review. The collection strategy depends on scope (set in Step 1):
If scope = "changes" (default — no arguments or auto-selected):
-
Identify changed files (primary scope):
- If on a non-main branch:
git diff main...HEAD --name-only - If on main branch with uncommitted changes:
git diff HEAD --name-only+git diff --cached --name-only(staged + unstaged) - If on main branch with no uncommitted changes:
git diff HEAD~1 --name-only(last commit) - Exclude non-source directories:
node_modules/,dist/,build/,.git/,vendor/,__pycache__/, the output directory itself
- If on a non-main branch:
-
Expand to impact zone (1 level): For each changed file, also include:
- Files that import or depend on the changed file (direct dependents — use
Grepto find import/require/use statements referencing the changed file) - Files that the changed file imports from (direct dependencies — read the changed file's import statements)
- Test files corresponding to the changed files (e.g.,
foo.test.tsforfoo.ts)
- Files that import or depend on the changed file (direct dependents — use
-
Fallback to full scan: Only if no changed files are found (clean repo, no recent commits), fall through to the
scope = "full"strategy below.
If scope = "full" (--project flag):
- Use project root markers to find source directories (e.g.,
src/,lib/,app/,pkg/, or language-specific patterns) - Exclude non-source directories:
node_modules/,dist/,build/,.git/,vendor/,__pycache__/, the output directory itself - Scan all source files
- If the project is large (>50 source files), prioritize:
- Core modules (entry points, main logic, business logic)
- Test files
- Configuration and infrastructure files
Both modes also collect:
- Package manifests (
package.json,Cargo.toml,pyproject.toml, etc.) for dependency review - Build/CI configuration if present
3P.2 Detect Project Type
Same as Step 3F.2 — detect project_type to guide dimension selection.
3P.3 Multi-Dimension Code Review (via Sub-agent)
Offload to sub-agent to handle deep source code analysis.
Spawn a Task tool call with:
subagent_type:"general-purpose"description:"Project code review: {project_name}"
Sub-agent prompt must include:
- Project name and root path
- List of all source files to review — sub-agent MUST read and analyze each file's actual implementation
- Reference level (
planning/docs/bare) and associated criteria (if any) - Detected project type
- If planning-backed: aggregated acceptance criteria (as checklist for consistency dimension only)
- If docs-backed: extracted requirements (as checklist for consistency dimension only)
- The severity level definitions (blocker / critical / warning / suggestion)
- Explicit instruction: "Read every source file. Review the code itself — its logic, structure, correctness, and quality. Reference documents are only used as criteria for the consistency check, not as the subject of review."
- Instruction: "For each issue, specify severity, file path, line number/range, what's wrong, and how to fix it. Use the Review Comment Formula: Problem → Why it matters → Suggested fix."
Review dimensions to apply: Follow Dimension Application Rules.
Additionally, apply the appropriate Consistency check based on reference level:
-
planning-backed → Plan Consistency:
- Aggregated acceptance criteria from all plans are met in the code
- Implemented architecture matches the designs in plan files
- No unplanned features added (scope creep)
- All planned features have corresponding code
-
docs-backed → Documentation Consistency:
- Code implements the requirements described in documentation
- Architecture aligns with tech design (if present)
- Feature scope in code matches what specs describe
- No undocumented major functionality in the code
-
bare → Skip this dimension. Note in the report: "No reference documents found — consistency check skipped."
Sub-agent must return the following structured format:
All issues MUST reference specific source files and line numbers/ranges.
REVIEW_SUMMARY:
overall_rating: <pass | pass_with_notes | needs_changes>
total_issues: <number>
blocker_count: <number>
critical_count: <number>
warning_count: <number>
suggestion_count: <number>
merge_readiness: <ready | fix_required | rework_required>
reference_level: <planning | docs | bare>
dimensions_reviewed: <list of dimension IDs reviewed>
FUNCTIONAL_CORRECTNESS: # D1
rating: <pass | warning | critical>
issues:
- severity: <blocker | critical | warning | suggestion>
file: path/to/file.ext
line: <number or range>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
SECURITY: # D2
rating: <pass | warning | critical>
issues: [same structure as D1]
RESOURCE_MANAGEMENT: # D3
rating: <pass | warning | critical>
issues: [same structure as D1]
CODE_QUALITY: # D4
rating: <good | acceptable | needs_work>
issues:
- severity: <critical | warning | suggestion>
file: path/to/file.ext
line: <number or range>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
ARCHITECTURE: # D5
rating: <good | acceptable | needs_work>
issues: [same structure as D4]
PERFORMANCE: # D6
rating: <good | acceptable | needs_work>
issues: [same structure as D4]
TEST_COVERAGE: # D7
rating: <good | acceptable | needs_work>
coverage_gaps:
- severity: <critical | warning | suggestion>
file: path/to/source.ext
description: <what scenario is untested>
ERROR_HANDLING_AND_OBSERVABILITY: # D8 + D9
rating: <good | acceptable | needs_work>
issues:
- severity: <warning | suggestion>
file: path/to/file.ext
line: <number or range>
category: <error_handling | logging | metrics | tracing>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
MAINTAINABILITY_AND_COMPATIBILITY: # D10 + D11 + D12 + D13
rating: <good | acceptable | needs_work>
issues:
- severity: <warning | suggestion>
file: path/to/file.ext
line: <number or range>
category: <standards | backward_compat | tech_debt | dependencies>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
ACCESSIBILITY: # D14 (frontend/fullstack only)
rating: <good | acceptable | needs_work | skipped>
issues:
- severity: <warning | suggestion>
file: path/to/file.ext
line: <number or range>
title: <short title>
description: <what's wrong and why it matters>
suggestion: <how to fix>
CONSISTENCY:
type: <plan_consistency | doc_consistency | skipped>
rating: <good | acceptable | needs_work | N/A>
criteria_met: <X/Y> (if applicable)
unmet_criteria:
- <criterion not met>
scope_issues:
- <unplanned additions or missing documented features>
→ Go to Step 4P
Step 4F: Feature Mode — Display Report
Review results are displayed in the terminal by default — no file is written. This reflects that reviews are iterative, intermediate checks rather than permanent artifacts.
Display the following report directly in the terminal using markdown:
# Code Review: {feature_name}
**Date:** {ISO date}
**Reviewer:** code-forge
**Overall Rating:** {pass | pass_with_notes | needs_changes}
**Merge Readiness:** {ready | fix_required | rework_required}
## Summary
{1-2 paragraph summary of the review findings}
**Issue Breakdown:** {blocker_count} blockers · {critical_count} critical · {warning_count} warnings · {suggestion_count} suggestions
---
## Tier 1 — Must-Fix Before Merge
### Functional Correctness (D1)
**Rating:** {rating}
{issues table with severity/file/line/title/description/suggestion, or "No issues found"}
### Security (D2)
**Rating:** {rating}
{issues or "No security concerns"}
### Resource Management (D3)
**Rating:** {rating}
{issues or "No resource management issues"}
---
## Tier 2 — Should-Fix
### Code Quality (D4)
**Rating:** {rating}
{issues or "No issues found"}
### Architecture & Design (D5)
**Rating:** {rating}
{issues or "No issues found"}
### Performance (D6)
**Rating:** {rating}
{issues or "No issues found"}
### Test Coverage (D7)
**Rating:** {rating}
{coverage gaps or "All scenarios covered"}
---
## Tier 3 — Recommended
### Error Handling & Observability (D8/D9)
**Rating:** {rating}
{issues or "No issues found"}
---
## Tier 4 — Nice-to-Have
### Maintainability & Compatibility (D10–D13)
**Rating:** {rating}
{issues or "No issues found"}
{If frontend/fullstack:}
### Accessibility / i18n (D14)
**Rating:** {rating}
{issues or "Skipped (not a frontend project)"}
---
## Plan Consistency
**Criteria Met:** {X/Y}
{unmet criteria or "All criteria met"}
---
## Recommendations
{Prioritized list of changes, grouped by blocking status:}
**Must fix before merge:**
1. {highest priority fix with file:line reference}
2. ...
**Should fix:**
1. {recommended fix}
2. ...
**Consider for later:**
1. {nice-to-have improvement}
2. ...
## Verdict
{Final assessment: merge as-is, fix blockers/criticals then merge, or needs rework}
4F.1 Optional: Save to File (--save)
If the user passed --save in the arguments, also write the report to {output_dir}/{feature_name}/review.md. Otherwise, do NOT create the file.
→ Go to Step 5F
Step 4P: Project Mode — Display Report
Use the same report template as Feature Mode (see Step 4F), with these differences:
- Title:
# Project Review: {project_name}(instead of feature name) - Header adds:
**Scope:** {changes (N changed + M related files) | full (N source files)} - Header adds:
**Reference:** {planning-backed | docs-backed | bare (no reference documents)} - Consistency section adapts to reference level:
- planning-backed →
## Plan Consistencywith criteria met/unmet - docs-backed →
## Documentation Consistencywith criteria met/unmet - bare →
*No reference documents found — consistency check skipped.*
- planning-backed →
4P.1 Optional: Save to File (--save)
If the user passed --save in the arguments, also write the report to {output_dir}/project-review.md. Otherwise, do NOT create the file.
→ Go to Step 5P
Step 5F: Feature Mode — Update state.json
- Read
state.json - Add or update
reviewfield in metadata:{ "review": { "date": "ISO timestamp", "rating": "pass_with_notes", "merge_readiness": "fix_required", "total_issues": 12, "blockers": 0, "criticals": 2, "warnings": 6, "suggestions": 4 } }- If
--savewas used, also include"report": "review.md"in the review object
- If
- Update
state.jsonupdatedtimestamp
→ Go to Step 6
Step 5P: Project Mode — No State Update
Project mode does not update any state.json — there is no single feature state to track.
→ Go to Step 6
Step 6: Summary and Next Steps
6.1 Feature Mode
Display:
Code Review Complete: {feature_name}
Rating: {overall_rating}
Merge Readiness: {merge_readiness}
Issues: {total_issues} ({blocker_count} blockers, {critical_count} critical, {warning_count} warnings, {suggestion_count} suggestions)
{If --save was used:}
Report saved: {output_dir}/{feature_name}/review.md
{If needs_changes (blockers or criticals > 0):}
🚫 Merge blocked — fix these first:
1. {highest priority blocker/critical with file:line}
2. {next priority fix}
...
Fix all: /code-forge:fixbug --review
Re-review: /code-forge:review {feature_name}
{If pass_with_notes (warnings > 0, no blockers/criticals):}
⚠ Merge OK with notes — consider fixing:
1. {top warning}
2. ...
Fix all: /code-forge:fixbug --review
{If pass:}
✅ Ready for next steps:
/code-forge:status {feature_name} View final status
Create a Pull Request
Tip: use --save to persist the review report to disk
6.2 Project Mode
Display:
Project Review Complete: {project_name}
Rating: {overall_rating}
Merge Readiness: {merge_readiness}
Reference: {planning-backed (N plans) | docs-backed (N documents) | bare}
Issues: {total_issues} ({blocker_count} blockers, {critical_count} critical, {warning_count} warnings, {suggestion_count} suggestions)
{If --save was used:}
Report saved: {output_dir}/project-review.md
{If needs_changes (blockers or criticals > 0):}
🚫 Issues require attention:
1. {highest priority blocker/critical with file:line}
2. {next priority fix}
...
Fix all: /code-forge:fixbug --review
Re-review: /code-forge:review --project
{If pass_with_notes (warnings > 0, no blockers/criticals):}
⚠ Project quality acceptable with notes — consider fixing:
1. {top warning}
2. ...
Fix all: /code-forge:fixbug --review
{If pass:}
✅ Project quality looks good.
Tip: use --save to persist the review report to disk