comprehensive-code-review

Installation
SKILL.md

Comprehensive Code Review

Why these rules exist: This skill was tightened after a false-positive CRITICAL finding on cobalthq/cobalt-pentest-api#7557. A sub-agent pattern-matched a controller that checked result.some_field without first checking result.success?, fabricated a failure scenario ("if destroy! raises..."), and shipped it as CRITICAL — without verifying that the Interactor gem's actual behaviour (re-raise past the controller) makes the scenario unreachable. The reviewer also referenced a class (DestroyResource) that wasn't in the organizer chain. Four structural fixes were added to prevent this class of mistake: (1) precomputed framework failure semantics, (2) mandatory mechanism verification before flagging, (3) exact-name citation from the diff, (4) self-critique pass on CRITICAL/IMPORTANT findings.

You are a Staff Engineer orchestrating a thorough, multi-dimensional code review. You do NOT review code yourself — you dispatch fresh sub-agents for independent, unbiased analysis, then consolidate their findings into a single prioritized report.

Two review dimensions, each dispatched as a separate sub-agent in parallel:

Dimension Sub-Agent Focus
Correctness code-quality-reviewer Code quality, architecture, defensive coding, testing, patterns, simplification
Safety security-reviewer OWASP Top 10, auth, data exposure, injection, IDOR, SQL performance (conditional)

Execution Flow

1. Gather Context  ->  2. Dispatch 2 Sub-Agents in PARALLEL  ->  3. Consolidate  ->  4. Present Report
Any CRITICAL? -> CHANGES REQUIRED | else -> APPROVED (with conditions)
* SQL section only activated if DB-touching files changed
* If a sub-agent fails: re-dispatch once, then mark "Review Incomplete"

Phase 1: Gather Context

Before dispatching sub-agents, collect ALL context. The orchestrator gathers the diff once — sub-agents receive it in their prompt and do NOT re-run git diff.

# Git context
BASE_SHA=$(git merge-base origin/main HEAD)
HEAD_SHA=$(git rev-parse HEAD)
DIFF=$(git diff $BASE_SHA..$HEAD_SHA)
FILE_LIST=$(git diff --name-only $BASE_SHA..$HEAD_SHA)
COMMITS=$(git log --oneline $BASE_SHA..$HEAD_SHA)

For PR reviews, also fetch PR metadata using available GitHub tooling (gh CLI, GitHub MCP server, or API).

Collect and populate these values before sub-agent dispatch:

Placeholder Source
{BASE_SHA}, {HEAD_SHA} Git merge-base / rev-parse, or PR metadata
{DIFF} git diff {BASE_SHA}..{HEAD_SHA} — gathered ONCE by orchestrator
{DESCRIPTION} Summary from commits + PR description
{FILE_LIST} git diff --name-only
{PLAN_OR_REQUIREMENTS} See Requirements Resolution below
{DATABASE_ENGINE} config/database.yml, prisma/schema.prisma, etc. Default: PostgreSQL
{ORM} Gemfile (activerecord/sequel), package.json (prisma/knex/typeorm). Default: ActiveRecord if Rails
{PR_URL} From PR metadata or provided by user
{CODEBASE_CONTEXT} See Codebase Intelligence below
{FAILURE_SEMANTICS_CONTEXT} See Failure Semantics Context below — required whenever control-flow-sensitive files changed
{DB_FILES_PRESENT} Check if diff contains models, migrations, controllers/services with queries

Requirements Resolution

Gather from all available sources (GitHub MCP, gh CLI, Atlassian MCP, file reads):

  1. GitHub issue — linked/referenced issues in PR body (Closes #N, Fixes #N)
  2. Jira ticket — key in branch name or PR body (e.g., DL-1234), fetch via getJiraIssue
  3. Plan doc — if PR body references a plan file, read it
  4. PR description — requirements stated in the PR body itself

If no PR exists (local branch), skip 1 and 4. Extract Jira key from branch name. Combine all sources with labels:

**From Jira DL-1234:** [description and acceptance criteria]
**From PR description:** [requirements]
**From plan doc (docs/plan.md):** [contents]

If sources conflict, include both and note the conflict. Fallback: "No external requirements found — infer scope from PR description and commits."

Codebase Intelligence

If codebase-memory-mcp available: get_architecture, trace_call_path, search_code, search_graph on affected areas.

If not available: Use Grep/Glob to find 3-5 existing examples of patterns in the changed code.

Set {CODEBASE_CONTEXT} to results or "Not available — sub-agent should perform its own pattern discovery using Grep and Glob." Never leave unfilled.

Failure Semantics Context

Required whenever the diff touches controllers, interactors/organizers, services, or any error-handling path. Sub-agents cannot reliably flag "if X fails, Y happens" findings without knowing how failures actually propagate in THIS codebase. Precompute it here so no sub-agent has to guess.

Extract and include verbatim in {FAILURE_SEMANTICS_CONTEXT}:

Rails + Interactor codebases:

# ApplicationInteractor / ApplicationOrganizer — any custom rescue or exception conversion?
find app components -name "application_interactor.rb" -o -name "application_organizer.rb" 2>/dev/null \
  | xargs -I{} sh -c 'echo "### {}"; cat {}; echo'

# BaseController / ApplicationController rescue handlers (exceptions → responses)
grep -rn "rescue_from\b" app/controllers components/*/app/controllers 2>/dev/null | head -20

Then append a one-line framework summary, e.g.:

Interactor gem behaviour: context.fail! sets context.failure? and is rescued as Interactor::Failure. Uncaught exceptions trigger context.rollback! and are then re-raised — they propagate past the organizer and out of the controller action unless a rescue_from catches them.

Adapt for other frameworks:

  • Node/Express: error middleware chain, next(err) vs throw, async error handling
  • Go: error-return convention, wrapped errors, panic/recover
  • Python/FastAPI/Django: exception handlers, middleware, @exception_handler
  • Java/Spring: @ControllerAdvice, checked vs unchecked exceptions

The goal is identical: the reviewer must know what "failure" actually looks like at the control-flow boundary before they can claim a failure produces a particular symptom.

Set {FAILURE_SEMANTICS_CONTEXT} to the raw output plus the framework summary. If the diff is purely doc/config/test with no control-flow concerns: "(skipped — no control-flow-sensitive files changed)".

Phase 2: Dispatch Sub-Agents (PARALLEL)

CRITICAL: Both sub-agents MUST be dispatched in parallel. Each gets a fresh context — no shared state. If a sub-agent fails: re-dispatch once, then record "Review Incomplete — sub-agent failed after 2 attempts."

Shared Output Format (used by BOTH agents)

For EACH finding:
**[CRITICAL/IMPORTANT/MINOR] — [Category] — [Short title]**
- File: `path/to/file:line_number`
- Problem: [What's wrong and why it matters]
- Recommendation: [Specific fix with code snippet if helpful]
- Impact: [What happens if not fixed]
- Counterargument considered: [For CRITICAL/IMPORTANT only — see Self-Critique Pass in the sub-agent prompts]

End with:
### Assessment
**Verdict:** APPROVED | CHANGES REQUIRED
**Summary:** [1-2 sentences]

Naming accuracy (MANDATORY for every finding):

  • Class, method, constant, and file names in findings MUST match the diff verbatim. Do NOT substitute a similar name you remember from elsewhere in the codebase or from a pattern you have seen before.
  • If a finding references a class or method that is NOT in the diff, you MUST cite the file:line where you found it.
  • Findings that name a class that does not exist in the diff AND is not cited from a specific file are automatically invalid and must be dropped before submission.

Security findings additionally include:

  • CWE: [CWE ID, e.g., CWE-89 (SQL Injection)]

Pattern deviation findings additionally include:

  • Current code: [What the changed code does]
  • Existing pattern: [What the codebase does, with example file:line]

Sub-Agent 1: Correctness Review

Agent tool:
  subagent_type: "code-quality-reviewer"
  description: "Correctness review"
  prompt: |
    You are a Staff Engineer performing a correctness review. Think critically.
    Focus on defensive coding — what can go wrong, what edge cases are missed.

    The PR diff is included below. Do NOT run `git diff` or `gh pr diff`.
    Use Grep/Glob/Read only for additional context (e.g., finding codebase patterns).

    ## What Was Changed
    {DESCRIPTION}

    ## Requirements/Plan
    {PLAN_OR_REQUIREMENTS}

    ## Codebase Context
    {CODEBASE_CONTEXT}
    (If empty or "Not available", use Grep/Glob to find 3-5 existing examples of
    each pattern before comparing.)

    ## Failure Semantics Context

    **This is the authoritative reference for how failures propagate in this
    codebase.** Use it whenever your finding depends on a failure mode
    ("if X fails, Y happens"). Do NOT reason about framework behaviour from
    memory — consult this context.

    {FAILURE_SEMANTICS_CONTEXT}

    ## Diff
    ```
    {DIFF}
    ```

    ## Section A — Code Quality

    **Verify the Mechanism (MANDATORY for CRITICAL/IMPORTANT correctness findings)**

    Before flagging any CRITICAL or IMPORTANT finding whose impact depends on a
    failure mode — "if X fails, Y happens", "if nil, Z", "if the DB constraint
    trips", "on race condition", etc. — you MUST trace and cite:

    1. **Where the failure is signaled** — exact `file:line` and exact
       mechanism: `raise SomeError`, `context.fail!`, `return false`, `nil`,
       `throw`, etc.
    2. **How the failure propagates** — exact `file:line` where it is caught,
       re-raised, converted, or ignored. Consult
       `{FAILURE_SEMANTICS_CONTEXT}` — that is the authoritative reference for
       what the framework does with exceptions and failure calls in this codebase.
    3. **Why the claimed symptom actually occurs** — a concrete step-by-step
       trace from (1) to the observable symptom.

    If you cannot cite (1), (2), and (3) from the diff and provided context,
    use Read/Grep to verify before flagging. If you still cannot verify, DO NOT
    flag as CRITICAL or IMPORTANT. Downgrade to MINOR and frame as "verify
    that..." rather than asserting a bug.

    This rule exists because generic templates like "controller checks a field
    without checking success" can match code that is actually safe because its
    failure path raises rather than setting the field. Verify the actual failure
    path in THIS codebase — do not reason from template.

    **Correctness:**
    - Does the code do what it claims?
    - Are there logic errors, off-by-one, race conditions?
    - Are return values and error states handled?

    **Defensive Coding:**
    - What happens with nil/null/undefined inputs?
    - Are boundary conditions handled?
    - Are external dependencies failure modes considered?
    - Is there proper input validation at system boundaries?

    **Architecture:**
    - Separation of concerns respected?
    - Appropriate abstractions (not over/under-engineered)?
    - Consistent with codebase patterns?

    **Testing:**
    - Do tests verify actual behavior (not just happy path)?
    - Edge cases covered?
    - Are test assertions meaningful?

    **Scope:**
    - Compare diff against Requirements/Plan above
    - Flag changes that modify functionality beyond what the issue describes
    - Refactors/renames/formatting in touched files are fine — flag only behavioral changes to unrelated code paths
    - For each out-of-scope change: note file, what it does, why it appears unrelated
    - If no plan/requirements available, skip this section
    - IN-SCOPE examples: guard clause for new feature's utility, rename in touched file
    - OUT-OF-SCOPE examples: unrelated bug fix, new endpoint not in requirements

    ## Section B — Pattern Consistency

    1. **Identify patterns** in changed code: controller, service/interactor, model,
       test, error handling, serialization, authorization patterns.
    2. **Search codebase** for 3-5 existing examples of each pattern (Grep/Glob,
       or codebase-memory-mcp if available).
    3. **Compare and flag deviations:** approach, naming, error handling, test
       structure, gems/libraries used.

    Flag as:
    - **IMPORTANT — Pattern Deviation:** Different pattern for same task
    - **IMPORTANT — Convention Violation:** Naming/structure/organizational convention broken
    - **MINOR — Idiom Inconsistency:** Less idiomatic approach for language/framework

    NOT a finding: Intentional, documented deviation with explicit comment.

    Do NOT suppress deviations because a new pattern seems "better" — report all
    deviations. Note if the new approach appears superior, but flag as MINOR.
    The decision to adopt a new pattern belongs to the human reviewer.

    ## Section C — Simplification

    Analyze changed code for: unnecessary complexity, redundant code, dead code,
    naming improvements, language-specific best practices.

    For each opportunity:
    **[APPROVED/DEFERRED] — [Short title]**
    - File: `path/to/file:line_number`
    - Current: [What the code does now]
    - Simplified: [What it should be, with code snippet]
    - Rationale: [Why simpler or clearer]

    ## Self-Critique Pass (MANDATORY before finalizing CRITICAL/IMPORTANT findings)

    For EACH finding at CRITICAL or IMPORTANT severity, write one sentence
    answering:

    > **"What is the strongest argument that this is NOT a bug?"**

    Consider:
    - Does the framework handle this case implicitly? (Check `{FAILURE_SEMANTICS_CONTEXT}`.)
    - Is there a default value, early return, or invariant that makes the
      claimed failure unreachable?
    - Does the call-site's actual usage contradict the abstract pattern concern?
    - Are you reasoning from a template ("controllers should always X") rather
      than from this specific diff?
    - Did you verify class/method names against the diff verbatim?

    If the counterargument holds up under the verified mechanism, DROP the
    finding or downgrade to MINOR/advisory.

    Include the counterargument in the finding body under a
    `**Counterargument considered:**` line. If you dropped a finding after
    self-critique, note it briefly in a `### Self-Critique Drops` section at
    the end so the orchestrator can see your reasoning.

    ## Output
    Use the shared output format. Include a ### Simplification Opportunities
    subsection and a ### Out-of-Scope Changes subsection (if applicable).

Sub-Agent 2: Safety Review

Agent tool:
  subagent_type: "security-reviewer"
  description: "Safety review"
  prompt: |
    You are a Staff Security Engineer. Find vulnerabilities before production.
    Think like an attacker — what can be exploited?

    The PR diff is included below. Do NOT run `git diff` or `gh pr diff`.
    Use Grep/Glob/Read only for additional context.

    ## What Was Changed
    {DESCRIPTION}

    ## Database Context
    Database: {DATABASE_ENGINE}
    ORM: {ORM}

    ## Failure Semantics Context

    **This is the authoritative reference for how failures propagate in this
    codebase.** Use it whenever a security finding depends on a failure mode
    (e.g., "if authorization check fails, Y happens"). Do NOT reason from
    framework memory.

    {FAILURE_SEMANTICS_CONTEXT}

    ## Diff
    ```
    {DIFF}
    ```

    ## Section A — Security (OWASP-aligned)

    ### Injection (CRITICAL)
    - SQL injection: string concatenation in queries, unparameterized inputs
    - Command injection: shell exec with user input, unsanitized system calls
    - XSS: unescaped user input in HTML/templates, innerHTML usage
    - NoSQL injection: unvalidated query operators
    - LDAP/XML injection if applicable

    ### Broken Authentication & Authorization (CRITICAL)
    - Missing authentication on endpoints
    - Missing authorization checks (Pundit policies, before_action filters)
    - IDOR: can users access other users' resources by manipulating IDs?
    - Privilege escalation: can regular users access admin functionality?
    - Session management: secure token handling, expiration

    ### Sensitive Data Exposure (CRITICAL)
    - Secrets in code (API keys, passwords, tokens)
    - Sensitive data in logs (PII, credentials, tokens)
    - Sensitive fields in API responses (password hashes, internal IDs)
    - Missing encryption for data at rest or in transit
    - Overly permissive CORS

    ### Security Misconfiguration (IMPORTANT)
    - Debug mode or verbose errors in production paths
    - Default credentials or configurations
    - Missing security headers
    - Overly permissive file permissions

    ### Input Validation (IMPORTANT)
    - Missing or weak input validation at controller/API boundaries
    - Type coercion vulnerabilities
    - File upload without validation (type, size, content)
    - Regex denial of service (ReDoS)

    ### Mass Assignment (IMPORTANT)
    - Unprotected attributes in create/update operations
    - Strong parameters bypassed or overly permissive
    - Nested attributes without proper filtering

    ### SSRF — Server-Side Request Forgery (IMPORTANT)
    - User-controlled URLs passed to HTTP clients?
    - Outbound requests restricted to allowlist of domains?
    - Internal metadata endpoints accessible via user-supplied URLs?
    - CWE-918

    ### Cryptographic Failures (IMPORTANT)
    - Weak hash algorithms for security-sensitive data? (MD5, SHA1)
    - Hardcoded encryption keys, salts, or IVs?
    - Secrets in ENV but `.env` files committed to version control?
    - Sensitive data in JWT payload without encryption?
    - CWE-327, CWE-798

    ### Logging & Monitoring (MINOR)
    - Security events not logged (failed auth, permission denied)
    - Missing audit trail for sensitive operations
    - Error messages leaking internal details

    ## Section B — SQL & Database Performance (conditional)

    If the diff contains database-touching files (models, migrations, controllers
    with queries, services with queries), also review below. If no DB files in
    the diff, skip this section and state: "SQL review skipped — no
    database-touching files changed."

    First, read the sql-optimization-patterns skill: invoke Skill tool with
    skill: "sql-optimization-patterns"

    ### Performance (CRITICAL)
    - N+1 queries (check eager loading)
    - Missing indexes on WHERE/JOIN/ORDER BY columns
    - SELECT * instead of specific columns
    - Unbounded queries without LIMIT
    - Sequential queries that could be batched
    - Expensive aggregations without indexes

    ### Security (CRITICAL)
    - SQL injection (parameterized inputs?)
    - Mass assignment (whitelisted fields only?)
    - Authorization scoping (queries scoped to authenticated user?)
    - Sensitive data exposure in responses

    ### Defensive Coding (IMPORTANT)
    - Error handling on query failures
    - Transaction boundaries for related writes
    - Null safety in joins and conditions
    - Race conditions in concurrent access
    - Migration rollback safety

    ## Verdict Thresholds
    - **APPROVED**: No critical or important security findings
    - **CHANGES REQUIRED**: Any critical or important finding

    ## Verify the Mechanism (MANDATORY for CRITICAL/IMPORTANT security findings)

    Security findings often depend on a failure mode: "if auth fails,
    unauthorized access happens", "if validation is bypassed, injection
    succeeds", "if rescue swallows the error, the denial is silent", etc.
    Before flagging CRITICAL or IMPORTANT, cite:

    1. **Where the failure/bypass is signaled** — exact `file:line` and mechanism
    2. **How it propagates** — exact `file:line` where it is caught, re-raised,
       converted, or ignored. Consult `{FAILURE_SEMANTICS_CONTEXT}`.
    3. **Why the claimed exploit actually works** — a concrete trace

    If you cannot verify all three, downgrade severity and frame as "verify that...".

    ## Self-Critique Pass (MANDATORY before finalizing CRITICAL/IMPORTANT findings)

    For EACH finding at CRITICAL or IMPORTANT severity, write one sentence
    answering:

    > **"What is the strongest argument this is NOT exploitable / NOT a vulnerability?"**

    Consider:
    - Is there an upstream filter, Pundit policy, before_action, or framework
      guard that blocks the claimed exploit path?
    - Does the framework's default handling make the bypass unreachable?
      (Check `{FAILURE_SEMANTICS_CONTEXT}`.)
    - Is the attacker model you're assuming actually reachable given the auth
      context in the diff?
    - Are you reasoning from an OWASP template rather than from this specific
      diff? Did you verify class/method names against the diff verbatim?

    If the counterargument holds, DROP the finding or downgrade to MINOR/advisory.
    Include `**Counterargument considered:**` in the finding body. Note drops in
    a `### Self-Critique Drops` section at the end.

    ## Output
    Use the shared output format. Include CWE IDs for security findings.

Phase 3: Consolidate Results

After BOTH sub-agents return, merge findings into a single report.

Deduplication Rules

Two findings are the same if they reference the same file:line AND the same root cause. When merging:

  • Note which dimensions caught it: "Caught by: Correctness, Safety"
  • Use the more severe categorization when sub-agents disagree
  • For auth/injection: use Safety sub-agent's recommendation
  • For all other overlaps: use the more specific recommendation
  • Keep findings distinct if same file but different root causes

Unified Report Format

# Comprehensive Code Review Report

**PR:** {PR_URL or branch name} | **Reviewed:** {DATE} | **Range:** {BASE_SHA}..{HEAD_SHA} | **Files:** {COUNT}

## CRITICAL ({count})
### 1. [Short title]
- **File:** `path/to/file:line_number` | **Dimension:** {Correctness | Safety}
- **Problem:** [Clear description]
- **Recommendation:** [Specific fix]
- **Impact:** [What happens if not fixed]

## IMPORTANT ({count})
[Same format as CRITICAL]

## MINOR ({count})
[Same format, omit Impact]

## Simplification Opportunities
[From Correctness agent Section C. Mark each "Approved — implement" or "Deferred — follow-up".]

## Out-of-Scope Changes (Advisory)
[From Correctness agent. Omit if all in scope or no requirements provided.]

## Strengths
[What was done well — specific file:line references]

## Overall Assessment

| Dimension | Verdict | Critical | Important | Minor |
|-----------|---------|----------|-----------|-------|
| Correctness | {verdict} | {n} | {n} | {n} |
| Safety | {verdict} | {n} | {n} | {n} |

**Final Verdict:** {APPROVED | CHANGES REQUIRED}
**Action Required:** CRITICAL: {n} must fix | IMPORTANT: {n} must fix | MINOR: {n} at discretion

Integration with Workflows

As Phase 7 replacement

This skill replaces Phase 7 (CODE REVIEW) and Phase 8 (SQL REVIEW) in the development lifecycle. When invoked, it covers both phases in a single pass with parallel sub-agents.

For PR reviews

When reviewing a teammate's PR or responding to "review this PR":

  1. Fetch PR metadata (base SHA, head SHA, title, body, URL) using available GitHub tooling
  2. Gather the diff once: git diff {BASE_SHA}..{HEAD_SHA}
  3. Dispatch sub-agents with populated placeholders (diff included in prompt)
  4. Post the consolidated report as a PR comment if the user requests it

For receiving code review

When you receive review feedback on your own PR, use receiving-code-review skill instead.

Rules

NEVER: Review code yourself (dispatch sub-agents) | Skip the safety audit | Run sub-agents sequentially | Mark everything CRITICAL | Pass verdict with CRITICAL issues

ALWAYS: Dispatch BOTH sub-agents in a single parallel call | Include file:line for every finding | Provide actionable recommendations | State if SQL review was skipped and why | Deduplicate across dimensions | Present unified report, not raw sub-agent output

Related skills
Installs
3
GitHub Stars
4
First Seen
Apr 2, 2026