skills/adaptationio/skrillz/multi-ai-code-review

multi-ai-code-review

SKILL.md

Multi-AI Code Review

Overview

multi-ai-code-review provides comprehensive code review using multiple AI models as specialized agents, each analyzing code from a different perspective. Based on 2024-2025 best practices for AI-assisted code review.

Purpose: Multi-perspective code quality assessment using AI ensemble with human oversight

Pattern: Task-based (5 independent review dimensions + orchestration)

Key Principles (validated by tri-AI research):

  1. Multi-Agent Architecture - Specialized agents for each review dimension
  2. LLM-as-Judge Consensus - Flag issues only when 2+ models agree
  3. Progressive Severity - Critical → High → Medium → Low prioritization
  4. Human-in-Loop - AI suggests, human decides
  5. Quality Gates - Block merges for critical unresolved issues
  6. Actionable Feedback - Every comment has What/Where/Why/How

Quality Targets:

  • False Positive Rate: <15%
  • Fix Acceptance Rate: >40%
  • Review Turnaround: <5 minutes
  • Bug Catch Rate: >30% pre-production

When to Use

Use multi-ai-code-review when:

  • Reviewing pull requests (any size)
  • Auditing code quality before release
  • Establishing consistent code review standards
  • Security auditing code changes
  • Performance profiling changes
  • Technical debt assessment
  • Onboarding reviews (mentorship mode)

When NOT to Use:

  • Trivial changes (typos, comments only)
  • Automated dependency updates (use dependabot labels)
  • Generated code (migrations, scaffolds)

Prerequisites

Required

  • Code to review (diff, file, or directory)
  • At least one AI available (Claude required, Gemini/Codex optional)

Recommended

  • Gemini CLI for web research and fast analysis
  • Codex CLI for deep code reasoning
  • Git repository context

Integration

  • GitHub Actions (optional, for CI/CD)
  • Pre-commit hooks (optional, for local checks)

Review Dimensions

5-Dimensional Analysis

Dimension Agent Focus Weight
Security Security Specialist OWASP Top 10, secrets, injection 25%
Performance Performance Engineer Complexity, memory, latency 20%
Maintainability Architect Patterns, modularity, DRY 25%
Correctness QA Engineer Logic, edge cases, tests 20%
Style Nitpicker Naming, formatting, conventions 10%

Severity Levels

Level Action Examples
Critical Block merge SQL injection, exposed secrets, data loss
High Require fix Race conditions, missing auth, memory leaks
Medium Suggest fix Code duplication, missing tests, complexity
Low Optional Style issues, naming, minor refactors

Operations

Operation 1: Quick Security Scan

Time: 2-5 minutes Automation: 80% Purpose: Fast security-focused review

Process:

  1. Scan for Critical Issues:
Review this code for security vulnerabilities:
- SQL injection
- XSS vulnerabilities
- Hardcoded secrets/API keys
- Authentication bypasses
- Authorization flaws
- Input validation gaps
- Insecure dependencies

Code:
[PASTE CODE OR DIFF]

For each issue found, provide:
- Severity (Critical/High/Medium)
- Location (file:line)
- Description (what's wrong)
- Fix (specific code change)
  1. Validate with Gemini (optional):
gemini -p "Verify these security findings. Are any false positives?
[PASTE CLAUDE FINDINGS]

Code context:
[PASTE RELEVANT CODE]"
  1. Output: Security report with consensus findings

Operation 2: Comprehensive PR Review

Time: 10-30 minutes Automation: 60% Purpose: Full multi-dimensional review

Process:

Step 1: Gather Context

# Get PR diff
git diff main...HEAD > /tmp/pr_diff.txt

# Identify affected areas
grep -E "^(\\+\\+\\+|---)" /tmp/pr_diff.txt | head -20

Step 2: Run Parallel Agent Reviews

Use Task tool to launch parallel agents:

Launch 3 parallel review agents:

Agent 1 (Security):
"Review this diff for security issues. Focus on:
- OWASP Top 10 vulnerabilities
- Authentication/authorization
- Input validation
- Secrets exposure
Diff: [DIFF]"

Agent 2 (Maintainability):
"Review this diff for maintainability. Focus on:
- Design patterns used correctly
- Code duplication (DRY)
- Modularity and cohesion
- Documentation quality
Diff: [DIFF]"

Agent 3 (Correctness):
"Review this diff for correctness. Focus on:
- Logic errors
- Edge cases not handled
- Test coverage gaps
- Error handling
Diff: [DIFF]"

Step 3: Orchestrate & Deduplicate

Synthesize findings from all agents:
[PASTE ALL AGENT OUTPUTS]

Tasks:
1. Remove duplicate findings
2. Rank by severity (Critical > High > Medium > Low)
3. Group by file
4. Generate summary table
5. Create final report with consensus issues only

Step 4: Generate Report

Output format:

## PR Review Summary

| File | Risk | Issues | Critical | High | Medium |
|------|------|--------|----------|------|--------|
| auth.py | High | 3 | 1 | 2 | 0 |
| api.py | Medium | 2 | 0 | 1 | 1 |

### Critical Issues (Block Merge)
1. **[auth.py:45]** SQL Injection vulnerability
   - Why: User input directly in query
   - Fix: Use parameterized queries

### High Issues (Require Fix)
...

### Consensus Score: 72/100
- Security: 65/100
- Performance: 80/100
- Maintainability: 70/100
- Correctness: 75/100
- Style: 85/100

Operation 3: LLM-as-Judge Tribunal

Time: 5-15 minutes Automation: 70% Purpose: High-confidence findings through consensus

Process:

  1. Run Code Through Multiple Models:

Claude Analysis:

Analyze this code for issues. Rate severity 1-10 for each:
[CODE]

Gemini Analysis (via CLI):

gemini -p "Analyze this code for issues. Rate severity 1-10 for each:
[CODE]"

Codex Analysis (via CLI):

codex "Analyze this code for issues. Rate severity 1-10 for each:
[CODE]"
  1. Calculate Consensus:
Given these analyses from 3 AI models:

Claude: [FINDINGS]
Gemini: [FINDINGS]
Codex: [FINDINGS]

Identify issues where at least 2 models agree:
1. List consensus findings
2. Average severity scores
3. Note any disagreements
4. Final verdict for each issue
  1. Output: High-confidence issue list (≥67% agreement)

Operation 4: Mentorship Review

Time: 15-30 minutes Automation: 40% Purpose: Educational code review for learning

Process:

Review this code in mentorship mode. For a developer learning [LANGUAGE/FRAMEWORK]:

Code: [CODE]

For each finding:
1. **What's the issue** (be encouraging, not critical)
2. **Why it matters** (explain the underlying concept)
3. **How to improve** (show before/after with explanation)
4. **Learn more** (link to relevant documentation)

Also highlight:
- What was done well
- Good patterns to continue using
- Growth opportunities

Tone: Supportive and educational, never condescending.

Operation 5: Pre-Release Audit

Time: 30-60 minutes Automation: 50% Purpose: Comprehensive review before production

Process:

  1. Full Codebase Scan:
# Identify all changes since last release
git diff v1.0.0...HEAD --stat
git log v1.0.0...HEAD --oneline
  1. Security Deep Dive:
  • Run all security checks
  • Verify no new vulnerabilities
  • Check dependency updates
  • Audit secrets management
  1. Performance Review:
  • Identify potential bottlenecks
  • Review database queries
  • Check for N+1 problems
  • Validate caching strategies
  1. Test Coverage:
  • Verify test coverage targets
  • Check critical path coverage
  • Validate edge case tests
  1. Generate Release Report:
## Pre-Release Audit: v1.1.0

### Security Clearance: PASS ✓
- No critical vulnerabilities
- All high issues resolved
- Secrets audit: Clean

### Performance Assessment: PASS ✓
- No new N+1 queries
- Response time within SLA
- Memory usage stable

### Test Coverage: 82% (target: 80%)
- Critical paths: 95%
- Edge cases: 78%

### Release Recommendation: APPROVED

Multi-AI Coordination

Agent Assignment Strategy

Task Primary Verification Speed
Security scan Claude Gemini Fast
Architecture review Claude Codex Medium
Logic validation Codex Claude Medium
Style checking Gemini Claude Fast
Performance analysis Claude Codex Medium

Coordination Commands

Launch Multi-Agent Review:

# Using Task tool for parallel execution
# Each agent reviews independently, orchestrator synthesizes

Gemini Quick Check:

gemini -p "Quick security scan of this code: [CODE]"

Codex Deep Analysis:

codex "Analyze this code architecture and suggest improvements: [CODE]"

CI/CD Integration

GitHub Actions Workflow

# .github/workflows/ai-review.yml
name: Multi-AI Code Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get PR Diff
        run: |
          git diff origin/main...HEAD > pr_diff.txt

      - name: Claude Review
        uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          model: "claude-sonnet-4-5-20250929"
          review_level: "detailed"

      - name: Post Summary
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## AI Review Summary\n${process.env.REVIEW_SUMMARY}`
            })

Quality Gate Configuration

# Block merge for critical issues
quality_gates:
  critical_issues: 0      # Must be zero
  high_issues: 3          # Max allowed
  coverage_minimum: 80    # Percent
  score_minimum: 70       # Out of 100

Quality Scoring

Scoring Formula

Overall = (Security × 0.25) + (Performance × 0.20) +
          (Maintainability × 0.25) + (Correctness × 0.20) +
          (Style × 0.10)

Grade Mapping

Score Grade Status
≥90 A Excellent - Ship it
80-89 B Good - Minor fixes
70-79 C Acceptable - Address issues
60-69 D Needs work - Significant fixes
<60 F Failing - Major revision needed

Anti-Patterns to Detect

  1. Hardcoded Secrets - API keys, passwords in code
  2. SQL Injection - Unparameterized queries
  3. XSS Vulnerabilities - Unsanitized output
  4. Race Conditions - Unprotected shared state
  5. Memory Leaks - Unclosed resources
  6. N+1 Queries - Loop database calls
  7. Dead Code - Unreachable branches
  8. God Objects - Classes doing too much
  9. Copy-Paste Code - Duplicated logic
  10. Missing Error Handling - Unhandled exceptions

Example Review Session

User: Review this PR for my authentication module

Claude: I'll perform a comprehensive multi-dimensional review.

[Launches parallel agents for security, maintainability, correctness]

## PR Review: Authentication Module

### Critical Issues (1)
1. **[auth.py:67]** Password stored in plaintext
   - Severity: Critical
   - Consensus: 3/3 models agree
   - Fix: Use bcrypt hashing
   ```python
   # Before
   user.password = request.password

   # After
   import bcrypt
   user.password = bcrypt.hashpw(request.password.encode(), bcrypt.gensalt())

High Issues (2)

  1. [auth.py:45] No rate limiting on login endpoint
  2. [auth.py:89] JWT secret hardcoded

Quality Score: 58/100 (Grade: F)

  • Security: 35/100 (Critical issues)
  • Performance: 70/100
  • Maintainability: 65/100
  • Correctness: 60/100
  • Style: 80/100

Recommendation: BLOCK MERGE

Resolve critical security issues before merging.


---

## Related Skills

- **multi-ai-testing**: Generate tests for reviewed code
- **multi-ai-verification**: Validate fixes
- **multi-ai-implementation**: Implement suggested fixes
- **codex-review**: Codex-specific review patterns
- **review-multi**: Skill-specific reviews

---

## References

- `references/security-checklist.md` - OWASP Top 10 checklist
- `references/performance-patterns.md` - Performance anti-patterns
- `references/ci-cd-integration.md` - Full CI/CD setup guide
Weekly Installs
1
Installed on
claude-code1