code-review-and-quality
Code Review and Quality
Overview
Multi-dimensional code review with quality gates. Every change gets reviewed before merge — no exceptions. Review covers five axes: correctness, readability, architecture, security, and performance.
The approval standard: Approve a change when it definitely improves overall code health, even if it isn't perfect. Perfect code doesn't exist — the goal is continuous improvement. Don't block a change because it isn't exactly how you would have written it. If it improves the codebase and follows the project's conventions, approve it.
When to Use
- Before merging any PR or change
- After completing a feature implementation
- When another agent or model produced code you need to evaluate
- When refactoring existing code
- After any bug fix (review both the fix and the regression test)
The Five-Axis Review
Every review evaluates code across these dimensions:
1. Correctness
Does the code do what it claims to do?
- Does it match the spec or task requirements?
- Are edge cases handled (null, empty, boundary values)?
- Are error paths handled (not just the happy path)?
- Does it pass all tests? Are the tests actually testing the right things?
- Are there off-by-one errors, race conditions, or state inconsistencies?
2. Readability & Simplicity
Can another engineer (or agent) understand this code without the author explaining it?
- Are names descriptive and consistent with project conventions? (No
temp,data,resultwithout context) - Is the control flow straightforward (avoid nested ternaries, deep callbacks)?
- Is the code organized logically (related code grouped, clear module boundaries)?
- Are there any "clever" tricks that should be simplified?
- Could this be done in fewer lines? (1000 lines where 100 suffice is a failure)
- Are abstractions earning their complexity? (Don't generalize until the third use case)
- Would comments help clarify non-obvious intent? (But don't comment obvious code.)
- Are there dead code artifacts: no-op variables (
_unused), backwards-compat shims, or// removedcomments?
3. Architecture
Does the change fit the system's design?
- Does it follow existing patterns or introduce a new one? If new, is it justified?
- Does it maintain clean module boundaries?
- Is there code duplication that should be shared?
- Are dependencies flowing in the right direction (no circular dependencies)?
- Is the abstraction level appropriate (not over-engineered, not too coupled)?
4. Security
For detailed security guidance, see security-and-hardening. Does the change introduce vulnerabilities?
- Is user input validated and sanitized?
- Are secrets kept out of code, logs, and version control?
- Is authentication/authorization checked where needed?
- Are SQL queries parameterized (no string concatenation)?
- Are outputs encoded to prevent XSS?
- Are dependencies from trusted sources with no known vulnerabilities?
- Is data from external sources (APIs, logs, user content, config files) treated as untrusted?
- Are external data flows validated at system boundaries before use in logic or rendering?
5. Performance
For detailed profiling and optimization, see performance-optimization. Does the change introduce performance problems?
- Any N+1 query patterns?
- Any unbounded loops or unconstrained data fetching?
- Any synchronous operations that should be async?
- Any unnecessary re-renders in UI components?
- Any missing pagination on list endpoints?
- Any large objects created in hot paths?
Change Sizing
Small, focused changes are easier to review, faster to merge, and safer to deploy. Target these sizes:
~100 lines changed → Good. Reviewable in one sitting.
~300 lines changed → Acceptable if it's a single logical change.
~1000 lines changed → Too large. Split it.
What counts as "one change": A single self-contained modification that addresses one thing, includes related tests, and keeps the system functional after submission. One part of a feature — not the whole feature.
Splitting strategies when a change is too large:
| Strategy | How | When |
|---|---|---|
| Stack | Submit a small change, start the next one based on it | Sequential dependencies |
| By file group | Separate changes for groups needing different reviewers | Cross-cutting concerns |
| Horizontal | Create shared code/stubs first, then consumers | Layered architecture |
| Vertical | Break into smaller full-stack slices of the feature | Feature work |
When large changes are acceptable: Complete file deletions and automated refactoring where the reviewer only needs to verify intent, not every line.
Separate refactoring from feature work. A change that refactors existing code and adds new behavior is two changes — submit them separately. Small cleanups (variable renaming) can be included at reviewer discretion.
Change Descriptions
Every change needs a description that stands alone in version control history.
First line: Short, imperative, standalone. "Delete the FizzBuzz RPC" not "Deleting the FizzBuzz RPC." Must be informative enough that someone searching history can understand the change without reading the diff.
Body: What is changing and why. Include context, decisions, and reasoning not visible in the code itself. Link to bug numbers, benchmark results, or design docs where relevant. Acknowledge approach shortcomings when they exist.
Anti-patterns: "Fix bug," "Fix build," "Add patch," "Moving code from A to B," "Phase 1," "Add convenience functions."
Review Process
Step 1: Understand the Context
Before looking at code, understand the intent:
- What is this change trying to accomplish?
- What spec or task does it implement?
- What is the expected behavior change?
Step 2: Review the Tests First
Tests reveal intent and coverage:
- Do tests exist for the change?
- Do they test behavior (not implementation details)?
- Are edge cases covered?
- Do tests have descriptive names?
- Would the tests catch a regression if the code changed?
Step 3: Review the Implementation
Walk through the code with the five axes in mind:
For each file changed:
1. Correctness: Does this code do what the test says it should?
2. Readability: Can I understand this without help?
3. Architecture: Does this fit the system?
4. Security: Any vulnerabilities?
5. Performance: Any bottlenecks?
Step 4: Categorize Findings
Label every comment with its severity so the author knows what's required vs optional:
| Prefix | Meaning | Author Action |
|---|---|---|
| (no prefix) | Required change | Must address before merge |
| Critical: | Blocks merge | Security vulnerability, data loss, broken functionality |
| Nit: | Minor, optional | Author may ignore — formatting, style preferences |
| Optional: / Consider: | Suggestion | Worth considering but not required |
| FYI | Informational only | No action needed — context for future reference |
This prevents authors from treating all feedback as mandatory and wasting time on optional suggestions.
Step 5: Verify the Verification
Check the author's verification story:
- What tests were run?
- Did the build pass?
- Was the change tested manually?
- Are there screenshots for UI changes?
- Is there a before/after comparison?
Multi-Model Review Pattern
Use different models for different review perspectives:
Model A writes the code
│
▼
Model B reviews for correctness and architecture
│
▼
Model A addresses the feedback
│
▼
Human makes the final call
This catches issues that a single model might miss — different models have different blind spots.
Example prompt for a review agent:
Review this code change for correctness, security, and adherence to
our project conventions. The spec says [X]. The change should [Y].
Flag any issues as Critical, Important, or Suggestion.
Dead Code Hygiene
After any refactoring or implementation change, check for orphaned code:
- Identify code that is now unreachable or unused
- List it explicitly
- Ask before deleting: "Should I remove these now-unused elements: [list]?"
Don't leave dead code lying around — it confuses future readers and agents. But don't silently delete things you're not sure about. When in doubt, ask.
DEAD CODE IDENTIFIED:
- formatLegacyDate() in src/utils/date.ts — replaced by formatDate()
- OldTaskCard component in src/components/ — replaced by TaskCard
- LEGACY_API_URL constant in src/config.ts — no remaining references
→ Safe to remove these?
Review Speed
Slow reviews block entire teams. The cost of context-switching to review is less than the waiting cost imposed on others.
- Respond within one business day — this is the maximum, not the target
- Ideal cadence: Respond shortly after a review request arrives, unless deep in focused coding. A typical change should complete multiple review rounds in a single day
- Prioritize fast individual responses over quick final approval. Quick feedback reduces frustration even if multiple rounds are needed
- Large changes: Ask the author to split them rather than reviewing one massive changeset
Handling Disagreements
When resolving review disputes, apply this hierarchy:
- Technical facts and data override opinions and preferences
- Style guides are the absolute authority on style matters
- Software design must be evaluated on engineering principles, not personal preference
- Codebase consistency is acceptable if it doesn't degrade overall health
Don't accept "I'll clean it up later." Experience shows deferred cleanup rarely happens. Require cleanup before submission unless it's a genuine emergency. If surrounding issues can't be addressed in this change, require filing a bug with self-assignment.
Honesty in Review
When reviewing code — whether written by you, another agent, or a human:
- Don't rubber-stamp. "LGTM" without evidence of review helps no one.
- Don't soften real issues. "This might be a minor concern" when it's a bug that will hit production is dishonest.
- Quantify problems when possible. "This N+1 query will add ~50ms per item in the list" is better than "this could be slow."
- Push back on approaches with clear problems. Sycophancy is a failure mode in reviews. If the implementation has issues, say so directly and propose alternatives.
- Accept override gracefully. If the author has full context and disagrees, defer to their judgment. Comment on code, not people — reframe personal critiques to focus on the code itself.
Dependency Discipline
Part of code review is dependency review:
Before adding any dependency:
- Does the existing stack solve this? (Often it does.)
- How large is the dependency? (Check bundle impact.)
- Is it actively maintained? (Check last commit, open issues.)
- Does it have known vulnerabilities? (
npm audit) - What's the license? (Must be compatible with the project.)
Rule: Prefer standard library and existing utilities over new dependencies. Every dependency is a liability.
The Review Checklist
## Review: [PR/Change title]
### Context
- [ ] I understand what this change does and why
### Correctness
- [ ] Change matches spec/task requirements
- [ ] Edge cases handled
- [ ] Error paths handled
- [ ] Tests cover the change adequately
### Readability
- [ ] Names are clear and consistent
- [ ] Logic is straightforward
- [ ] No unnecessary complexity
### Architecture
- [ ] Follows existing patterns
- [ ] No unnecessary coupling or dependencies
- [ ] Appropriate abstraction level
### Security
- [ ] No secrets in code
- [ ] Input validated at boundaries
- [ ] No injection vulnerabilities
- [ ] Auth checks in place
- [ ] External data sources treated as untrusted
### Performance
- [ ] No N+1 patterns
- [ ] No unbounded operations
- [ ] Pagination on list endpoints
### Verification
- [ ] Tests pass
- [ ] Build succeeds
- [ ] Manual verification done (if applicable)
### Verdict
- [ ] **Approve** — Ready to merge
- [ ] **Request changes** — Issues must be addressed
Common Rationalizations
| Rationalization | Reality |
|---|---|
| "It works, that's good enough" | Working code that's unreadable, insecure, or architecturally wrong creates debt that compounds. |
| "I wrote it, so I know it's correct" | Authors are blind to their own assumptions. Every change benefits from another set of eyes. |
| "We'll clean it up later" | Later never comes. The review is the quality gate — use it. Require cleanup before merge, not after. |
| "AI-generated code is probably fine" | AI code needs more scrutiny, not less. It's confident and plausible, even when wrong. |
| "The tests pass, so it's good" | Tests are necessary but not sufficient. They don't catch architecture problems, security issues, or readability concerns. |
Red Flags
- PRs merged without any review
- Review that only checks if tests pass (ignoring other axes)
- "LGTM" without evidence of actual review
- Security-sensitive changes without security-focused review
- Large PRs that are "too big to review properly" (split them)
- No regression tests with bug fix PRs
- Review comments without severity labels — makes it unclear what's required vs optional
- Accepting "I'll fix it later" — it never happens
Verification
After review is complete:
- All Critical issues are resolved
- All Important issues are resolved or explicitly deferred with justification
- Tests pass
- Build succeeds
- The verification story is documented (what changed, how it was verified)