reviewing-code
Reviewing Code
Adversarial review loop: reviewer finds issues → fixer resolves them → fresh reviewer verifies → repeat until clean. All findings and fixes happen locally — nothing is posted to GitHub.
Reference Files
| File | Read When |
|---|---|
references/review-perspectives.md |
Spawning reviewer subagents — describes each review angle and what to look for |
references/confidence-scoring.md |
Scoring findings — rubric, thresholds, and false positive exclusions |
How It Works
This is a fully autonomous process — the user is not involved until the review is complete or the manager needs help with something it can't resolve on its own.
The manager (you) orchestrates the loop between two roles:
- Reviewer subagents — spawned with fresh context, no knowledge of how the code was written. They return scored findings.
- Fixer subagents — receive the filtered findings and make targeted fixes.
The manager decides everything dynamically: how many reviewers to spawn, how many fixers to spawn, whether to loop again, and when to escalate to the user. No numbers are hardcoded — scale to the size and complexity of the change.
Workflow
Phase 1: Scope the Review
- Determine what to review — default: all changed files vs the base branch
git diff --name-only origin/{base_branch}...HEADfor the file listgit diff origin/{base_branch}...HEADfor the full diff- If the user specifies a narrower scope (specific files, a directory), use that instead
- Read CLAUDE.md and AGENTS.md if they exist — these inform the compliance review agent
- Read the diff to understand the scope and nature of changes
Phase 2: Review
Load references/review-perspectives.md for the detailed instructions to give each agent.
Decide how many reviewers to spawn based on the scope of changes:
- Small change (1-3 files, < 100 lines) — 1 subagent covering all perspectives (bug scan, compliance, readability) in a single pass
- Medium change (4-10 files, 100-500 lines) — 2 parallel subagents (one for bugs/logic, one for compliance/readability)
- Large change (10+ files, 500+ lines) — up to 3 parallel subagents, each with a dedicated perspective:
- Bug & Logic Scanner
- Standards & Compliance Auditor
- Fresh-Eyes Readability Reviewer
These are guidelines, not rules — use judgment. A 2-file change touching critical auth code may warrant 3 reviewers. A 20-file rename refactor may only need 1.
Each subagent receives the diff, the list of changed files, and any relevant CLAUDE.md content. Subagents must not see each other's findings — independent review is the point.
Each agent returns a list of findings, each with:
- File and line range
- Category: bug, security, logic, style, readability, compliance, performance
- Severity: critical, major, minor, nitpick
- Confidence score: 0-100 (see
references/confidence-scoring.md) - Description: what's wrong and why it matters
- Suggested fix: concrete description of what to change
Phase 3: Score and Filter
Load references/confidence-scoring.md for the rubric and false positive list.
- Collect all findings from the 3 reviewer subagents
- Deduplicate — if multiple agents flagged the same issue (same file, within 3 lines), keep the highest-confidence version
- Filter — discard findings with confidence below 70
- Classify remaining findings:
- All nitpicks or empty → stop, review is clean → go to Phase 6
- Any critical/major findings → must fix → go to Phase 4
- Only minor findings → use judgment: fix if straightforward, otherwise stop
Phase 4: Fix
- Group findings by file proximity and logical relationship into fix batches
- Decide how many fixer subagents to spawn:
- 1-2 simple findings in the same area → 1 fixer subagent
- Multiple findings across independent files → parallel fixer subagents, one per file group
- Complex findings requiring careful thought → fewer fixers working sequentially so each can verify before the next starts
- Each fixer receives:
- The specific findings to address (file, line, description, suggested fix)
- The current file contents
- Instructions to make minimal, targeted changes — fix exactly what was flagged, nothing more
- After fixes are applied, verify:
- Run project lint/test commands if available
- If tests fail, the fixer should address the failure before returning
Phase 5: Re-Review
Spawn new reviewer subagent(s) with fresh context — no knowledge of the previous review or fixes.
Scale the re-review the same way as Phase 2: if the fixes were small and localized, one reviewer is enough. If the fixes were extensive or touched many files, use more reviewers.
Each agent receives:
- The updated diff (
git diff origin/{base_branch}...HEAD) - CLAUDE.md content
- A focused brief: "Review this diff for bugs, logic issues, compliance, and readability. Return scored findings."
Evaluate the results:
- No findings above threshold → done → go to Phase 6
- New findings exist → return to Phase 4 (fix) then Phase 5 (re-review)
- Same findings keep recurring (agent is going in circles) → escalate to user
There is no hard iteration cap. Stop when findings degrade to nitpicks. If the loop isn't converging (same issues reappearing, or fixes introducing new issues of similar severity), ask the user for guidance rather than looping forever.
Phase 6: Report
Present a summary to the user:
## Review Complete
**Rounds:** {N} review cycles
**Findings:** {total} found, {fixed} fixed, {filtered} filtered as false positives
**Changes:** {files_changed} files modified
### Fixed Issues
1. [{severity}] {description} — {file}:{line}
2. ...
### Remaining (nitpicks, not fixed)
1. [{severity}] {description} — {file}:{line}
2. ...
If the user wants to proceed, they can commit and push. The skill does not commit or push automatically.
When the Manager Should Escalate
Ask the user for help when:
- A finding is ambiguous — could be intentional or a bug, and there's no way to tell from the code alone
- The fix for an issue would require architectural changes or touching many files outside the diff scope
- Two reviewer rounds flagged the same issue and the fixer couldn't resolve it
- The reviewer and fixer disagree (fix introduced a new issue of equal or higher severity)
- A finding touches business logic where the "correct" behavior isn't clear from code or docs
Anti-patterns
- Reviewer subagents must not see each other's findings �� independent review is the whole point
- Fixer must not add features, refactor, or "improve" code beyond the flagged issues
- Do not fix nitpicks — they are informational only
- Do not commit automatically — the user decides when to commit
- Do not post to GitHub — this is an internal review
- Do not involve the user during the loop — only escalate when genuinely stuck (see escalation criteria above)
- Do not hardcode agent counts — scale reviewers and fixers to the size and complexity of the work
- Do not loop more than needed — if the re-reviewer returns only nitpicks, stop
More from riccardogrin/skills
initializing-projects
Generates a minimal, self-maintaining CLAUDE.md for projects through auto-detection and developer interview. Covers project identity, do/don't rules, hooks, and self-maintenance. Use when starting a new project or adding Claude Code support to an existing one
13transcribing-youtube
Downloads YouTube videos, transcribes audio via OpenAI Whisper, and produces summaries stored locally. Covers yt-dlp download, audio extraction, transcription, caching, and summarization. Use when a YouTube link is shared and the user wants a transcript or summary
11testing-browser
Guides browser-based UI verification using Playwright. Covers server lifecycle, accessibility snapshots, screenshots, and assertion-based verification. Use when verifying UI behavior, testing web apps, or adding browser checks to loop agent VERIFY phases
10generating-game-changelogs
Generates player-facing game changelogs from implementation plans and git history. Covers thematic intros, system-grouped sections, and concise flavor descriptions. Use when generating a game changelog, writing game release notes, or preparing an update post for itch.io or Steam
10designing-frontend
Guides creation of high-quality frontend interfaces using proven design principles for color, typography, spacing, layout, depth, animation, and UX. Covers both building new UIs and auditing existing ones. Use when creating web components, pages, or applications, or when reviewing frontend code for design quality
6being-careful
Activates safety hooks that block dangerous shell commands for the rest of the session — rm -rf, DROP TABLE, force-push, hard reset, kubectl delete, and other destructive operations. Use when touching production, working with sensitive data, or doing risky operations where an accidental destructive command could cause harm
3