autistic-code-review
Autistic Code Review
Goal
Audit an implementation end-to-end, with or without a formal plan, and produce a defensible review with evidence from code, diffs, tests, and manual UI verification.
When to use
Use this skill when the user asks for a broad post-implementation review such as:
- comparing implementation to an attached plan or handoff
- reviewing uncommitted or committed changes for regressions and bugs
- manually verifying front-end behavior with Playwright and/or agent-browser
- assessing strategic implementation quality, not only local correctness
- identifying test coverage gaps, adding tests, and running suites across application and database layers
Entry criteria
Check these preconditions before deep review:
- repo scope is clear (
cwd, target project, and base branch/range known) - change scope is available (
git status/git diffor explicit commit range) - runnable environment exists for intended checks (tests/build/dev server as needed)
- UI verification prerequisites are known (auth path, test user/role, seed state)
- DB review prerequisites are known when relevant (local DB state, migration order, reset/test commands)
- test command set is known (
npm test/vitest,supabase test db, and any targeted commands)
If any criterion fails, continue with available lanes and clearly report blocked coverage.
Inputs
Gather the following before review:
- Intention source (preferred in this order):
.plan.mdfile path, or- pasted implementation/handoff text in the prompt, or
- no-plan mode (derive expected behavior from changed files, tests, docs, and commit/diff context)
- Change scope:
- uncommitted (
git status,git diff), or - committed range (
git diff <base>...HEAD)
- UI scope:
- routes/pages to verify, pulled from plan, tests, docs, and changed files
- Test scope:
- app-layer test framework/commands
- DB-layer test framework/commands (for example pgTAP via
supabase test db)
If any item is missing and blocks execution, ask one short question. Otherwise, state assumptions and proceed.
Review modes
Select one mode explicitly at the start of the review:
planmode
- Use when a
.plan.mdis available. - Evaluate strict plan-to-implementation alignment.
handoffmode
- Use when only prompt/handoff intent is available.
- Evaluate claim-to-implementation alignment.
no-planmode
- Use when no plan/handoff is provided.
- Skip strict alignment claims and focus on correctness, regressions, UX behavior, coverage, and strategy quality.
self-reviewmode
- Use when the same agent that implemented changes performs the review.
- Treat prior assumptions as untrusted and require diff/test/UI evidence for every claim.
Parallel subagents
Run parallel subagents with explicit, non-overlapping responsibilities:
plan-alignment-reviewer
- Build an intention-to-evidence matrix from plan/handoff claims.
- Verify each claim against actual file diffs.
- Flag missing, partial, or extra implementation.
ui-verification-reviewer
- Perform manual UI checks using Playwright or agent-browser.
- Validate key user paths and permissions/role gating.
- Record pass/fail with exact route and observed behavior.
technical-risk-reviewer
- Perform code review on changed files.
- Prioritize bugs, regressions, data/permission risks, and design-level defects.
- Include file references and concrete failure modes.
strategic-reviewer
- Evaluate architecture and implementation strategy.
- Identify coupling, migration safety gaps, maintainability risks, and scalability concerns.
- Suggest alternatives only when they materially reduce risk.
test-coverage-reviewer
- Determine test coverage for changed behavior across app and DB layers.
- Identify missing tests and high-risk untested paths.
- Suggest and/or create targeted tests to close gaps.
- Run relevant suites and report results with command evidence.
Subagent output contract
Require each subagent to return this exact structure:
findings: severity-ranked items with file references when applicableevidence: concrete observations (diff snippet summary, command result, UI observation)confidence:high | medium | lowper findingunverified_assumptions: assumptions that could change conclusionsblocked_items: what could not be validated and why
Reject subagent output that is opinion-only or lacks evidence.
UI coverage matrix
Build and execute a minimal matrix:
- persona/role x route/page x key action x expected result
- include at least one happy path and one negative/permission-boundary path per protected area
- include a navigation/gating check (route guard, menu visibility, or access denial behavior)
- record each matrix row as
pass,fail, orblocked
When blocked, capture exact blocker and the attempted step.
Test coverage matrix
Build and execute a minimal matrix:
- changed component/module/function/table/function/RPC x existing tests x gap x action
- app layer: unit/integration tests for changed behavior and boundary cases
- DB layer: pgTAP (or equivalent) coverage for changed tables, policies, functions, and permissions
- include at least one negative path for each changed permission-sensitive behavior
Action values:
covered(existing tests already sufficient)add-tests(write targeted tests)deferred(cannot safely add in scope; justify)
When add-tests is chosen, create focused tests and run affected suites.
Workflow
- Establish scope and evidence
- Determine review mode (
plan,handoff, orno-plan) and whether review isself-review. - Read plan/handoff text when provided.
- Enumerate changed files and classify by area (DB/schema, server, client, tests/docs).
- Derive expected outcomes from the best available intention source for the selected mode.
- Validate entry criteria and set timebox
- Confirm entry criteria; note any missing prerequisites.
- Set a review timebox and prioritize critical paths first (permissions, data integrity, primary UI flows, high-risk untested changes).
- Dispatch the five subagents in parallel
- Provide each subagent only the context needed for its lane.
- Require each subagent to return contract-compliant output.
- Run UI verification explicitly
- Start from user-visible flows (routes, nav, forms, role-conditional UI).
- Verify both happy path and at least one negative/permission boundary path.
- When blocked (auth, env, seed data), report blocker and partial coverage.
- Run DB/migration checklist when schema or SQL changed
- check RLS/policy behavior against intended access model
- check migration safety (ordering, idempotency where relevant, rollback feasibility)
- check grants/privileges drift and RPC exposure changes
- check seed/test/type-generation consistency with schema changes
- Close test coverage gaps
- map changed behaviors to existing tests (app + DB)
- create targeted tests for high-risk uncovered behavior where feasible
- run relevant app-layer and DB-layer suites
- capture exact commands and pass/fail output summary
- Consolidate findings
- de-duplicate overlaps across subagents
- convert raw notes into severity-ranked findings
- separate confirmed defects from open questions
- Deliver review result
- findings first (highest severity first)
- then alignment/reconstruction matrix, UI status, coverage status, technical analysis, strategic analysis, artifacts, and verdict
- if timebox expires or blockers remain, provide partial verdict with explicit coverage gaps
Severity model
Use this priority scale:
P0: release-blocking correctness or security issueP1: high-risk bug/regression likely to affect production behaviorP2: meaningful correctness/maintainability/test gapP3: minor issue or improvement opportunity
Sign-off gates
Apply these gates before issuing the final verdict:
- do not return
alignedif any openP0orP1exists - do not return
alignedwhen critical UI flows areblockedwithout mitigation evidence - do not return
alignedwhen DB/migration changes were made but DB checklist was skipped - do not return
alignedwhen high-risk changed behavior has unresolved coverage gaps or failing tests - in
no-planmode, returnno-plan reviewed(never strictaligned)
Output template
Review target: `<plan path or prompt summary>`
Review mode: `<plan | handoff | no-plan>` (+ `self-review` when applicable)
Change scope: `<uncommitted | commit range>`
Findings:
1. [P1] <title> — `<file:line>`
Evidence: <what was observed>
Impact: <user/system impact>
Recommendation: <concrete fix>
1. [P2] <title> — `<file:line>`
Evidence: <what was observed>
Impact: <user/system impact>
Recommendation: <concrete fix>
Plan alignment matrix (for `plan`/`handoff` modes):
1. `<planned item>` -> `<implemented evidence>` -> `<aligned | partial | missing | extra>`
1. `<planned item>` -> `<implemented evidence>` -> `<aligned | partial | missing | extra>`
Intent reconstruction matrix (for `no-plan` mode):
1. `<inferred expected behavior>` -> `<implemented evidence>` -> `<confirmed | partial | contradicted>`
1. `<inferred expected behavior>` -> `<implemented evidence>` -> `<confirmed | partial | contradicted>`
UI verification:
1. `<route + area + action>` -> `<pass/fail/blocked>` -> `<observed result>`
1. `<route + area + action>` -> `<pass/fail/blocked>` -> `<observed result>`
Blockers: <none or list>
Test coverage:
1. `<changed behavior>` -> `<existing coverage>` -> `<gap>` -> `<covered | add-tests | deferred>`
1. `<changed behavior>` -> `<existing coverage>` -> `<gap>` -> `<covered | add-tests | deferred>`
Test execution:
- `<command>` -> `<pass/fail>` -> `<key result>`
- `<command>` -> `<pass/fail>` -> `<key result>`
Technical analysis:
- `<top technical risk or confirmation>`
- `<top technical risk or confirmation>`
Strategic analysis:
- `<strategy strength/weakness>`
- `<strategy strength/weakness>`
Review artifacts:
- `<commands run and key outcomes>`
- `<ui evidence: screenshots/log notes or blocker proof>`
- `<coverage summary: tested vs blocked vs deferred>`
Verdict: `<aligned | partially aligned | not aligned | no-plan reviewed>`
Recommended next steps:
1. <step>
1. <step>
Guardrails
- do not mark
alignedunless plan claims are evidenced in diffs/tests/UI checks - in
no-planmode, do not claim strict alignment; use verdictno-plan reviewed - do not bury critical defects under summary text; findings must appear first
- if UI cannot be fully executed, provide exact blocker and what was still validated
- if tests cannot be executed, list exact missing prerequisites and impacted confidence
- prefer concrete, falsifiable statements over broad judgments
- in
self-reviewmode, call out reviewer/implementer overlap and keep evidence thresholds strict - enforce subagent output contract; request retries for incomplete outputs
- if review is partial due to blockers/timebox, say so explicitly in verdict context
Subagent prompt pack
Use these prompts as-is, replacing placeholders.
Parent orchestration prompt
Run autistic-code-review.
Context:
- Review target: <plan path OR handoff summary OR "none">
- Review mode: <plan | handoff | no-plan>
- Self-review: <yes | no>
- Change scope: <uncommitted | commit range>
- Repo/project path: <path>
- UI routes in scope: <route list>
- Test commands in scope: <app commands + DB commands>
- Timebox: <minutes>
Execution requirements:
1) Spawn five parallel subagents:
- plan-alignment-reviewer
- ui-verification-reviewer
- technical-risk-reviewer
- strategic-reviewer
- test-coverage-reviewer
2) Enforce this output contract for every subagent:
- findings
- evidence
- confidence
- unverified_assumptions
- blocked_items
3) Reject and retry any subagent output that lacks evidence.
4) Require the test-coverage-reviewer to suggest/create tests for uncovered high-risk changes and run relevant suites.
5) Consolidate results into one findings-first report with severity ordering.
6) Apply sign-off gates from the skill and produce a final verdict.
Prompt: plan-alignment-reviewer
You are the plan-alignment-reviewer.
Inputs:
- Review mode: <plan | handoff | no-plan>
- Intention source: <plan path or handoff text; can be empty in no-plan mode>
- Change scope: <uncommitted | commit range>
- Changed file list/diff summary: <insert>
Tasks:
1) Build an intention-to-evidence matrix from intention claims and actual diffs.
2) For each claim, classify as aligned, partial, missing, or extra.
3) In no-plan mode, produce an intent reconstruction matrix:
- inferred expected behavior -> implemented evidence -> confirmed/partial/contradicted
4) Flag any claimed work not evidenced in code/tests/docs.
Return exactly:
- findings: severity-ranked issues with file refs
- evidence: specific diff/test/doc observations
- confidence: high/medium/low per finding
- unverified_assumptions: assumptions and why
- blocked_items: what prevented validation
Prompt: ui-verification-reviewer
You are the ui-verification-reviewer.
Inputs:
- UI scope routes/pages: <insert>
- Personas/roles: <insert>
- Environment/access constraints: <insert>
- Change scope summary: <insert>
Tasks:
1) Use Playwright and/or agent-browser to manually verify UI behavior.
2) Build and execute a coverage matrix:
- role x route/page x key action x expected result
3) Include at least:
- one happy path per protected area
- one negative/permission-boundary path per protected area
- one gating/navigation check (route guard/menu visibility/access denial)
4) Record each row as pass/fail/blocked with observed result.
5) Capture evidence artifacts (screenshots/log notes) for failures or blockers.
Return exactly:
- findings: severity-ranked UI defects/regressions
- evidence: route-level observations and artifact references
- confidence: high/medium/low per finding
- unverified_assumptions: missing env/auth/data assumptions
- blocked_items: exact blocker + attempted step
Prompt: technical-risk-reviewer
You are the technical-risk-reviewer.
Inputs:
- Changed files and diff: <insert>
- Related tests/docs/commands run: <insert>
- Review mode and constraints: <insert>
Tasks:
1) Perform a code review focused on:
- correctness bugs
- behavioral regressions
- data integrity and permission risks
- missing or weak tests
2) If SQL/schema changed, run DB/migration checklist:
- RLS/policy behavior vs intended access model
- migration safety, ordering, rollback feasibility
- grants/privileges/RPC exposure drift
- seed/test/type-generation consistency
3) Prioritize findings by P0-P3 and include file references.
Return exactly:
- findings: severity-ranked technical issues with file refs
- evidence: concrete code/diff/test command observations
- confidence: high/medium/low per finding
- unverified_assumptions: what is assumed but unproven
- blocked_items: checks that could not be completed
Prompt: strategic-reviewer
You are the strategic-reviewer.
Inputs:
- Implementation summary: <insert>
- Changed areas by layer (db/server/client/tests/docs): <insert>
- Review mode: <insert>
Tasks:
1) Evaluate implementation strategy quality:
- architecture cohesion and coupling
- migration/cutover safety and operability
- maintainability and future change cost
- scalability and team workflow implications
2) Identify strategic weaknesses and practical alternatives.
3) Recommend only changes that materially reduce risk or complexity.
Return exactly:
- findings: severity-ranked strategic risks/anti-patterns
- evidence: concrete repo or diff observations
- confidence: high/medium/low per finding
- unverified_assumptions: strategic assumptions needing confirmation
- blocked_items: missing context that limits confidence
Prompt: test-coverage-reviewer
You are the test-coverage-reviewer.
Inputs:
- Changed files and diff: <insert>
- Existing tests in scope: <insert>
- Test commands:
- app layer: <insert>
- DB layer (pgTAP or equivalent): <insert>
- Review mode and constraints: <insert>
Tasks:
1) Build a coverage matrix:
- changed behavior -> existing tests -> gap -> action
2) Identify high-risk untested behavior in app and DB layers.
3) Suggest and create targeted tests to close feasible gaps.
- app layer: unit/integration tests for changed behavior and boundaries
- DB layer: pgTAP tests for changed tables/functions/policies/permissions
4) Run relevant test suites after test additions/updates.
5) Report pass/fail and any remaining uncovered high-risk behavior.
Return exactly:
- findings: severity-ranked coverage and test-quality issues
- evidence: coverage matrix + test diffs + command results
- confidence: high/medium/low per finding
- unverified_assumptions: assumptions about environment/data/setup
- blocked_items: tests not run or not creatable and why
Consolidation prompt (optional)
Consolidate five subagent outputs into one final review.
Rules:
1) Findings first, highest severity first, deduplicated across lanes.
2) Keep only evidence-backed findings.
3) Include mode-appropriate matrix:
- plan/handoff -> plan alignment matrix
- no-plan -> intent reconstruction matrix
4) Include UI verification status, blockers, and coverage summary.
5) Include test coverage matrix, tests added/suggested, and execution results.
6) Apply sign-off gates before verdict.
7) Verdict allowed values:
- aligned
- partially aligned
- not aligned
- no-plan reviewed