multi-review-aggregation
Multi-Review Aggregation
Dispatch N independent code reviews and aggregate findings. Each reviewer catches different bugs -- union preserves the long tail that single-shot misses.
Research basis: SWR-Bench (arXiv 2509.01494) -- N independent reviews: 43.67% F1 improvement, 118.83% recall improvement. Diminishing returns past N=5; N=3 captures most improvement.
Core principle: Independence via separate Task dispatches -- same base prompt, no shared context.
N Selection by Tier
| Tier | N Reviews | Rationale |
|---|---|---|
| max-20x | 3 | Quality priority -- full aggregation |
| max-5x | 3 | Balanced -- same recall benefit |
| pro/api | 1 | Budget priority -- single review |
When N=1, skip this skill -- use standard single code review.
Parallel Dispatch Pattern
After spec review passes, dispatch N independent reviews (run_in_background=True, each gets "Reviewer i of N"). If all approve with 0 Critical/Important: fast path, skip aggregation. Otherwise dispatch aggregator (haiku model).
Full dispatch code: see references/dispatch-code.md. Aggregator prompt: see ./aggregator-prompt.md.
Aggregation Algorithm
Fast Path
ALL N reviewers return VERDICT: APPROVE with CRITICAL: 0 and IMPORTANT: 0 -> skip aggregation. Unanimous clean = review output.
Severity Voting (When Any Reviewer Raises Issues)
| Condition | Result |
|---|---|
| All reviewers agree on severity | Keep that severity |
| Reviewers disagree | Use highest severity |
| Lone finding Critical or Important | Keep original severity (no downgrade) |
| Lone finding Minor | Downgrade to Suggestion |
| Lone finding BUT security or data-loss | Keep original severity (no downgrade) |
Severity levels: Critical > Important > Minor > Suggestion
Verdict
- "Ready to merge: Yes" -- zero Critical AND zero Important AND majority approved
- "Ready to merge: With fixes" -- only Minor/Suggestion after aggregation
- "Ready to merge: No" -- any Critical or Important remain
Full deduplication/merging rules: see references/aggregation-details.md.
Output Format
## Strengths
- [strength] [Reviewers: 1, 2, 3]
## Issues
### Critical / Important / Minor / Suggestion
- [issue] [Reviewers: N, N] -- file:line
(note downgrade/security provenance as applicable)
## Assessment
Ready to merge: [Yes/With fixes/No]
Reviewers: X/N approved, Y requested changes
Full format spec: see references/output-provenance.md.
Red Flags
Never:
- Share context between reviewers (defeats independence)
- Use N>1 for pro/api tier (budget constraint)
- Skip aggregation when reviewers disagree
- Downgrade security findings even as lone findings
Always:
- Dispatch all N reviews in parallel
- Include reviewer number in each dispatch prompt
- Use haiku for aggregation
- Record per-reviewer metrics separately
Reference Files
references/dispatch-code.md: Full dispatch flow with on_spec_review_pass handlerreferences/aggregation-details.md: Deduplication, strengths merging, malformed output, timeout recoveryreferences/output-provenance.md: Provenance annotation rules and full output format specreferences/metrics-and-cost.md: Per-reviewer metric keys, cost impact, per-tier breakdownaggregator-prompt.md: Aggregator Task dispatch prompt template