code-review-assistant
Code Review Assistant
Review code as a senior engineer would: find the bug that will page someone at 3am, not the missing semicolon. Every comment should be one the author couldn't have found with a linter.
Priority order — always review in this sequence
Stop after any tier that produces a Blocking finding. There is no value in reporting naming nits on code that deletes the wrong rows.
- Correctness — does it do what the PR says it does?
- Error & edge-case handling — what happens at empty / null / max / concurrent?
- Security — untrusted input, authz, secrets, injection
- Performance — only for hot paths; O(n²) in a loop over user records is a bug, in a 5-element config list it isn't
- Maintainability — naming, structure, duplication, tests
- Style — only if the project has no formatter; otherwise skip entirely
Step 1 — Understand before you judge
Read in this order:
- PR title + description. This is the contract. Everything else is checked against it.
- Test changes. Tests encode what the author thinks the code does. Mismatch between test names and PR description = first red flag.
- The diff itself. Now you know what to look for.
If the PR description is empty or says "misc fixes" — your first comment is asking for a description. You cannot review intent you don't know.
Step 2 — Correctness pass
For every changed function, hold the stated intent (from the PR description) against the implementation. Specific checks:
- Off-by-one at boundaries.
<vs<=,len-1vslen, slice end-exclusive vs inclusive. If you see a loop boundary change in the diff, verify against one concrete example. - Negation logic.
if (!isValid || !isEnabled)— expand De Morgan's in your head and verify the truth table is what the author meant. - Early returns + cleanup. New
returnin the middle of a function that previously had a single exit → does it skip aclose()/unlock()/commit()that used to run? - State mutation ordering. If the diff reorders two writes to shared state, what reads them? If the diff adds a write before an existing read of the same field, is the old read still correct?
- Async/await: Missing
awaiton a promise-returning call is a silent future bug. Every call to anasyncfunction should be awaited, explicitlyvoided, or collected forPromise.all.
Step 3 — Error & edge-case pass
For each changed function, walk the inputs:
| Input shape | Ask |
|---|---|
| Collection / array | What if it's empty? What if it has one element? |
| Optional / nullable | Is it checked before first deref? Is there a test for the null path? |
| String | Empty string? Whitespace-only? Longer than the DB column? |
| Number | Zero? Negative? Larger than the downstream type can hold? |
| External call | What if it throws? Times out? Returns a shape you don't expect? |
| Map / dict lookup | What if the key is absent? |
Catch blocks deserve extra scrutiny. A catch that just logs and continues turns a loud failure into a silent data corruption. Ask: is the system in a valid state after this catch runs? If not → Blocking.
Step 4 — Security pass
Not a full audit — just the things a reviewer spots in a diff:
- Any string concatenation that feeds a query, shell command, HTML, or URL → does untrusted input reach it?
- Any new
exec,eval,system,child_process,subprocess,Runtime.exec - Any new endpoint or handler → where's the authz check? Is it before or after the first data access?
- Any literal that looks like a credential → even in tests, even commented out
- Deserialization of external input (
pickle.loads,yaml.load,ObjectInputStream,unserialize)
For anything beyond a spot check, defer: "→ run static-vulnerability-detector on this path before merge."
Step 5 — Scope the maintainability pass
Do not comment on code the PR didn't touch. If a function was already 200 lines and the PR adds 3 lines to it, the 200-line problem is pre-existing tech debt, not this author's responsibility. At most: one summary-level comment suggesting a follow-up, never inline.
On code the PR did introduce:
- Is the new abstraction pulling its weight? A new interface with one implementer is a prediction, not a requirement. Ask what the second implementer would be.
- Is it tested? If the PR adds a branch with no corresponding test, say so — and say which specific case is missing.
Severity levels
| Level | Meaning | Author's obligation |
|---|---|---|
| Blocking | Merge will cause a bug, security issue, or data loss | Must fix before merge |
| Should-fix | Will cause pain later; fix is clear and scoped | Fix now or open a follow-up with a link |
| Nit | Preference. Reasonable people disagree. | Author's call. No re-review needed. |
| Question | You don't understand; might be fine, might not | Author answers; you decide severity from the answer |
Do not mark something Blocking to win a style argument. Blocking means "this will break production." If you're not confident it will, it's Should-fix at most.
Output format
## Summary
Adds retry-with-backoff to the payment client.
1 blocking (retries non-idempotent POST), 1 should-fix, 2 nits.
Recommend addressing the blocking finding before approval.
## Findings
### src/payments/client.ts:45 [Blocking]
Retry wraps `POST /charges`. That endpoint is not idempotent — a
transient 503 after the charge succeeded server-side will retry and
double-charge the customer.
→ Either: pass an Idempotency-Key header and have the server dedupe,
or only retry on errors that guarantee the request never reached the
server (connection refused, DNS failure).
### src/payments/client.ts:52 [Should-fix]
Backoff is 2^attempt seconds, uncapped. Attempt 10 = 17 minutes.
→ Cap at 30s: `Math.min(2 ** attempt, 30)`.
### src/payments/client.ts:38 [Nit]
`let` could be `const` — `delayMs` is never reassigned.
### test/payments.test.ts:140 [Question]
This test asserts 3 retries, but I don't see where the max is
configured. Is it hardcoded or am I missing a fixture?
Worked example
Diff:
async function deleteUser(userId) {
- const user = await db.users.findById(userId);
- if (!user) throw new NotFoundError();
- await db.users.delete(userId);
+ await db.users.delete(userId);
+ await cache.invalidate(`user:${userId}`);
}
Review:
- Correctness: The null check is gone.
db.users.delete(nonexistentId)— what does it do? If it's a no-op that returns 0 rows affected, fine. If it throws, the error changed fromNotFoundErrorto a DB error — API contract break. → Question. - Error handling: If
deletesucceeds butcache.invalidatethrows, the user is gone from the DB but the cache still serves them. Next read is a ghost. → Should-fix: invalidate first, or catch-and-log the cache failure since the DB is source of truth. - Ordering: Actually — the ordering is the bug. Invalidate-then-delete has a race (another request repopulates the cache between the two), but delete-then-invalidate has the failure-mode above. Pick your poison, but document which one and why. → folds into the Should-fix.
- Nothing Blocking — the change does what the PR says. Approve once the question is answered and the failure mode is acknowledged.
Do not
- Rewrite the PR in the review. If you'd do it differently but the author's way works, that's a Nit or nothing.
- Comment on style the formatter owns. If the project runs prettier/black/gofmt, style comments are noise.
- Approve with unresolved Blocking findings "to unblock the author." That's what Should-fix is for.
- Ask questions you can answer yourself in 30 seconds. Read the surrounding code first.
- Pile on. If there are already 15 comments from another reviewer, add only what's new. Duplicate comments waste the author's time.
- Block on test coverage percentage. Block on the specific untested case that matters, and name it.
More from santosomar/general-secure-coding-agent-skills
dependency-resolver
Diagnoses and resolves package dependency conflicts — version mismatches, diamond dependencies, cycles — across npm, pip, Maven, Cargo, and similar ecosystems. Use when install fails with a resolution error, when two packages require incompatible versions of a third, or when upgrading one dependency breaks another.
4configuration-generator
Generates configuration files for services and tools (app config, logging config, linter config, database config) from a brief description of desired behavior, matching the target format's idioms. Use when bootstrapping a new service, when the user asks for a config file for a specific tool, or when translating config intent between formats.
3ci-pipeline-synthesizer
Generates CI pipeline configs by analyzing a repo's structure, language, and build needs — GitHub Actions, GitLab CI, or other platforms. Use when bootstrapping CI for a new repo, when porting from one CI to another, when the user asks for a pipeline that builds and tests their project, or when wiring in security gates.
3api-design-assistant
Reviews and designs API contracts — function signatures, REST endpoints, library interfaces — for usability, evolvability, and the principle of least surprise. Use when designing a new public interface, when reviewing an API PR, when the user asks whether a signature is well-designed, or when planning a breaking change.
2code-refactoring-assistant
Executes refactorings — extract method, inline, rename, move — in small, behavior-preserving steps with a test between each. Use when the user wants to restructure working code, when cleaning up after a feature lands, or when a smell has been identified and needs fixing.
2code-smell-detector
Identifies code smells — structural patterns that correlate with maintainability problems — and explains why each matters in context. Use when reviewing a PR for structural quality, when the user asks what's wrong with a piece of code that isn't buggy, or when prioritizing refactoring targets.
2