multi-model-code-review
Multi-Model Code Review
Run code changes through multiple AI models to catch issues before human review. Each model reviews independently, feedback is synthesized, changes are made, and the loop repeats until the code is solid.
Prerequisites
At least one agentic CLI must be installed and authenticated:
| CLI | Verify | Print-mode flag |
|---|---|---|
| Cursor Agent | agent --version |
agent --model <model> --print "<prompt>" |
| Claude Code | claude --version |
claude --model <model> -p "<prompt>" |
| OpenAI Codex | codex --version |
codex --model <model> -q "<prompt>" |
If cursor's agent is installed, always prefer using that and specifying models rather than mixing and matching CLI utilities.
Workflow
1. Prepare the review context
Create a branch-scoped working directory so artifacts from different reviews never collide:
BRANCH=$(git rev-parse --abbrev-ref HEAD)
REVIEW_DIR="/tmp/code-review-${BRANCH}"
mkdir -p "$REVIEW_DIR"
Gather the diff and any relevant context for the reviewer. Write it to
$REVIEW_DIR/ using the template in
assets/code-review-prompt-template.md.
git diff <parent-branch>...HEAD > "$REVIEW_DIR/diff.txt"
Build the review prompt at $REVIEW_DIR/review.md. Include in it:
- Origin — where the work came from (ticket link, user conversation, incident, spec/RFC). This gives the reviewer the "why."
- What changed and why — a brief summary of the goal.
- The diff — the actual code changes.
- Key files — full contents of new or heavily modified files (diffs alone can lack context for new classes).
- Relevant conventions — project-specific patterns the reviewer should enforce (e.g. naming, logging, testing conventions).
2. Send to models in parallel
Fire both review requests in a single message so they run concurrently. Use whichever CLI(s) and models the user prefers. If the user does not specify, you always want to use the latest model that is high thinking from two different providers.
Do not assume the models that are available. Always check with the CLI tool first.
For example:
Capture each model's output into $REVIEW_DIR/ so results are preserved
alongside the prompt that produced them:
# Cursor Agent
agent --model gpt-5.5-extra-high --print "$(cat "$REVIEW_DIR/review.md")" 2>&1 | tee "$REVIEW_DIR/response-gpt.md"
agent --model claude-opus-4-7-thinking-xhigh --print "$(cat "$REVIEW_DIR/review.md")" 2>&1 | tee "$REVIEW_DIR/response-opus.md"
# Claude Code
claude --model claude-sonnet-4-20250514 -p "$(cat "$REVIEW_DIR/review.md")" 2>&1 | tee "$REVIEW_DIR/response-sonnet.md"
# OpenAI Codex
codex --model o3 -q "$(cat "$REVIEW_DIR/review.md")" 2>&1 | tee "$REVIEW_DIR/response-o3.md"
3. Synthesize feedback
After both responses return:
- Categorize each piece of feedback as critical, recommended, or trivial.
- Identify points of agreement — these are high-confidence issues.
- Identify disagreements — evaluate which model is correct by checking the actual codebase.
- Challenge feedback that is wrong (see Guidelines below).
- Summarize the synthesis to the user.
4. Implement fixes
Address critical and recommended feedback. For each fix:
- Make the change.
- Note which reviewer's feedback drove it.
5. Re-review if needed
Send the updated code back for another round. Skip re-review for trivial fixes (typos, comment tweaks).
Repeat until:
- No unresolved critical issues remain.
- Both models approve or approve-with-minor-changes.
6. Push for human review
Once AI reviewers are satisfied, push the branch and create the MR/PR. The AI review rounds serve as a quality gate, not a replacement for human review.
Guidelines
- Always use thinking models. Code review requires deep reasoning about
behavior, edge cases, and sequencing. Never use non-thinking models
(e.g.
gpt-4o,claude-sonnet). Use models with extended thinking (e.g.gpt-5.3-codex-high,composer-2,claude-sonnet-4-thinking,o3). - Use different providers. Models from the same provider share training biases. Use at least two different providers to get genuinely diverse perspectives.
- Send reviews with no prior context. Do not include your own assessment or opinions in the review prompt. Each model must form an independent judgment from the code alone.
- Challenge feedback that is wrong. Consensus does not mean capitulation. If a reviewer suggests a change that contradicts project conventions, misreads the code, or would introduce a regression, push back with evidence. Include the correction in the next review round so the model can learn from the context. A model being confident does not make it correct.
- Let reviewers see the codebase. When using CLIs that support workspace
access (e.g.
agentwith--print), run from the repo root so the reviewer can read files for context beyond the diff. - Don't rubber-stamp. If both models say "looks good" but you see an issue, raise it. The models are reviewers, not authorities.
- Preserve existing test coverage. If a reviewer suggests removing or weakening tests, reject it unless there is a clear justification (e.g. the test was asserting on an implementation detail that changed).
More from cosmastech/skills
multi-model-planning
>-
16php-conventions
Personal PHP conventions enforced when creating or modifying PHP files. Covers strict types, function imports, testing philosophy, class design, and observability. Activate whenever working on PHP code.
15laravel-conventions
Personal Laravel conventions enforced when creating, modifying, or planning code that will touch code utilizing the Laravel framework.
13planning-conventions
Personal conventions for planning software changes. Covers planning doc structure, failure scenario analysis, refactoring sequencing, observability planning, and multi-model review. Activate when planning features, writing design docs, or scoping technical work.
3