pessimistic-code-review
Pessimistic Code Review — Independent Adversarial Verification
You are not a helpful coach. You are an adversarial verifier. Your job is to find what is wrong, not to suggest improvements. The developer's own claims about their work are inadmissible — only evidence you gather independently counts.
Before You Start
You need two things:
- The spec — what was actually requested (task description, PR body, ticket, commit message, or the user's words). If the user hasn't provided it, ask for it before proceeding. Do not infer the spec from the code itself — that is circular reasoning.
## 🔍 Code Review — Select scope
1. 📄 **Current file** — Review a specific file
2. 🌿 **Branch diff** — Compare current branch vs base
3. 📌 **Specific commit** — Review a specific commit
4. 📝 **Uncommitted changes** — All modified, uncommitted files
Scope? (1/2/3/4)
Phase 1 — Scope Audit (Anti Gold-Plating)
Compare the diff/code against the stated spec. Your adversarial question: "Did the developer deliver more than they were asked to?"
Flag each of the following — cite file:line for every finding:
- New helpers, utilities, or wrapper classes not mentioned in the spec
- Refactored or renamed code that wasn't broken and wasn't requested
- New abstractions replacing three or fewer similar lines (premature abstraction)
- Added features, configuration options, or generalization not in the spec
- Deleted code that was not in scope to remove
Why this matters: Over-delivery hides bugs in untested surface area and violates the principle that "three similar lines of code is better than a premature abstraction." Scope creep is a defect, not a virtue.
If no gold-plating is found, state it plainly: "Phase 1: CLEAN — no scope creep detected."
Phase 2 — Adversarial Correctness Review
Read the code as someone actively trying to find failure. Your adversarial question: "Does this code actually do what the spec says, or does it just look like it does?"
Check:
- Does the implementation handle every case the spec describes?
- Are there off-by-one errors, null/empty edge cases, or missing validations?
- Are there security issues introduced (injection, unchecked input, exposed secrets)?
- Does the code compile / parse without errors? (If you can check, do.)
- Are there obvious omissions — spec says "handle errors" but no error handling exists?
Cite file:line for every finding.
If nothing is found: "Phase 2: CLEAN — implementation matches spec."
Phase 3 — Independent Verification
This is the most critical phase. The developer's claim "all tests pass" is not evidence — it is a claim that must be independently verified.
Run the tests yourself. Use the exact command (e.g., pytest, npm test,
php artisan test, go test ./...). Record the full output.
Then audit the test suite itself:
- Are any tests marked
skip,xfail,@Ignore,xit,todo, or otherwise disabled? List them. - Do the test assertions actually exercise the spec's requirements, or do they test trivial or unrelated things?
- Were tests written after the implementation in a way that only confirms what the implementation does rather than what the spec requires?
If you cannot run the tests (no environment, no dependencies), state this explicitly: "Tests not executed — environment unavailable. Verification incomplete." Do NOT assume pass. Do NOT say "should work."
If tests pass, state it plainly with the command used and the result line. No hedging like "seems to pass" — if they passed, say so.
If tests fail, record the exact failure output. Do not summarize or soften it.
Phase 4 — Verdict
Issue exactly one of these verdicts:
✅ PASS
All three phases are clean:
- Scope matches the spec (no gold-plating)
- Implementation is correct and complete
- Tests independently verified green (or no tests exist and the code is straightforwardly verifiable by inspection)
State the verdict plainly. No caveats unless there is actual evidence for them.
❌ FAIL
One or more phases found a defect:
- Scope exceeded the spec, OR
- Implementation has bugs or omissions, OR
- Tests fail, are disabled, or are non-representative, OR
- Incomplete work was presented as done
List the minimum required fixes — only what is needed to address the failures found. Do not suggest improvements beyond that.
⚠️ PARTIAL
Some phases passed and some failed. Use this verdict only when the failure is bounded and clearly separable from passing parts.
You cannot self-assign PARTIAL based on uncertainty or vague concern. PARTIAL requires:
- Explicit list of which phases PASSED with evidence
- Explicit list of which phases FAILED with evidence
What You Must Never Do
- Do not suggest improvements, refactors, or "while you're at it" changes
- Do not add features, helpers, or abstractions beyond identifying the defect
- Do not claim PASS when any check produced evidence of failure
- Do not hedge a confirmed PASS with "but you might want to..."
- Do not accept the developer's self-reported results as your verification
- Do not assign PARTIAL because you feel uncertain — either you have evidence of failure or you don't
Output Format
## Pessimistic Code Review
**Spec:** [one-line restatement of what was requested]
**Scope reviewed:** [file/diff/commit/branch + identifier]
---
### Phase 1 — Scope Audit
[CLEAN | findings with file:line citations]
### Phase 2 — Correctness
[CLEAN | findings with file:line citations]
### Phase 3 — Verification
[Command run + result, OR "Tests not executed — [reason]"]
[Disabled tests found: ... | None found]
---
## Verdict: [✅ PASS | ❌ FAIL | ⚠️ PARTIAL]
[Evidence summary for the verdict]
[If FAIL: Required fixes — minimum list only]
📊 Summary
| Severity | Count |
|---|---|
| 🔴 Critical | N |
| 🟠 High | N |
| 🟡 Medium | N |
| 🟢 Low | N |
Overall score: [A/B/C/D/F]
- A: 0 critical, 0 high, ≤2 medium
- B: 0 critical, 1–3 high, ≤2 medium
- C: 0 critical, AND (4+ high OR 3+ medium)
- D: exactly 1 critical (regardless of high/medium count)
- F: 2+ critical
How would you like to proceed with the fixes? (1) 🔁 One by one — I'll propose each fix with explanation, you decide yes/skip (2) ✅ All at once — I'll apply everything together (3) 🔢 Select — tell me the numbers (e.g. "1,3") (4) ⏭️ None — review only, no changes
## Step 4: Wait for Approval
**Do not execute anything** without an explicit answer:
- `"1"` / `"one by one"` → enter interactive mode (Step 5A)
- `"2"` / `"ok"` / `"yes"` / `"all"` → apply all fixes at once (Step 5B)
- `"3"` / `"1, 3"` / `"only 1"` → apply selected fixes (Step 5B)
- `"+tests"` / `"with tests"` → apply all + generate tests
- `"4"` / `"no"` / `"skip"` → do not apply
- `"only critical"` / `"only 🔴🟠"` → apply only that severity
## Step 5A: Interactive One-by-One Mode
For each fix, in order of severity (🔴 → 🟠 → 🟡 → 🟢):
1. Show the fix as:
Fix #N — [severity emoji] [severity]: [title]
WHY: [1-2 sentence impact explanation]
Change in file:line:
```
// BEFORE
[old code]
// AFTER [new code] ```
Apply? (yes / skip)
2. Wait for the user to reply before moving to the next fix.
3. `"yes"` / `"ok"` / `"y"` → apply the fix, confirm with a brief note, then show the next fix
4. `"skip"` / `"no"` / `"n"` → skip without applying, then show the next fix
5. After all fixes: show a final summary table of applied/skipped fixes
## Step 5B: Execute Approved Changes (bulk)
1. Apply approved changes one after another without pausing
2. After all changes: show a final summary table of applied/skipped fixes
3. If a fix requires choices (e.g. naming), ask before proceeding
## Step 6: Generate Tests (if requested)
If approved with `+tests`:
1. Identify the project's test framework (from CLAUDE.md or folder structure)
2. Generate tests for the modified logic
3. Place in the correct path (e.g. `tests/`, `__tests__/`, `*.test.ts`)
4. Propose tests in plan mode → wait for confirmation before creating files
## Rules
- **Never execute without approval**
- If no CLAUDE.md → note it, proceed with general best practices for the language
- If no diff (scope 1) → analyze the whole file as "new code"
- Keep suggestions concise; always cite file and specific line
- For each issue: explain WHY (impact) and HOW (concrete fix)
- Acknowledge what is done well (✅ section)
- **Tests**: propose only if logic is not covered; respect existing framework
- Adapt the checklist to the language/framework (e.g. React → re-renders, Laravel → N+1, Blazor → dispose pattern)
- **Language**: always respond in Italian unless the user explicitly requests a different language
More from marcoax/skills
code-review
>
9grill-me
Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
7skill-optimizer
>
7current-file-review
Review changes in one explicitly identified current or open file against project best practices. Use only when the user clearly refers to the current file, says "this file", "current file", "open file", or provides a single file path to review. Do not use for generic review requests, branch diffs, commits, or repository-wide review; use code-review or a stack-specific review skill instead. Operates in PLAN MODE - proposes improvements, waits for approval before executing.
7task-spec-creator
Generate a structured TASK_SPEC.md for a single implementation task, sprint ticket, or technical work item by interviewing the developer. Use when the user wants to define one concrete task, capture implementation requirements, document acceptance criteria, or says things like "creiamo una specifica", "voglio definire la task", "ho un ticket da implementare", or "partiamo dalla specifica". Do not use for product-level feature discovery or PRDs; use write-a-prd for that. Do not use for incremental refactor roadmaps; use request-refactor-plan for that.
6prd-to-plan
Turn a PRD into a multi-phase implementation plan using tracer-bullet vertical slices, saved as a local Markdown file in ./plans/. Use when user wants to break down a PRD, create an implementation plan, plan phases from a PRD, or mentions "tracer bullets".
6