khuym:reviewing
Reviewing
If .khuym/onboarding.json is missing or stale for the current repo, stop and invoke khuym:using-khuym before continuing.
Post-execution quality verification. You are the last automated gate before a feature ships. Your job is to catch what escaped execution — not just confirm tasks are closed, but verify that the work is correct, safe, and complete.
Research confirms this is not optional: removing the verification agent degrades fix precision most sharply (Multi-Agent Bug Detection, IJRASET 2025). Multi-perspective review covers 7.8 dimensions vs. 1–3 for a single reviewer (Hydra-Reviewer, IEEE TSE 2025).
Communication Standard
Reviewing is where terse technical shorthand is most dangerous. The default tone here is:
- explain the bug in plain language first
- then show the evidence
- then give one concrete failure scenario
- then give the smallest credible fix direction
If a finding makes sense only to someone who already read the diff carefully, it is not written well enough yet.
When to Invoke
- After
khuym:swarmingreports the final phase is complete - Manually: when spot-checking any branch or set of changes
- Flags:
--serial(always serial),--skip-uat(auto mode only, skips Phase 3)
Prerequisites
Read before starting:
history/<feature>/CONTEXT.md— locked decisions (D1, D2...) and testable deliverableshistory/<feature>/approach.md— planned approach and risk map from planning.khuym/STATE.md— current epic state
Phase 1: Automated Review (5 Specialist Agents)
Dispatch Rules
| Condition | Mode |
|---|---|
| ≤4 agents active | Parallel (default) |
| 5+ agents active | Serial (auto-switch — inform user) |
--serial flag |
Always serial |
With 5 agents, auto-switch to serial mode and tell the user: "Running review agents in serial mode (5 agents). Use --parallel to override."
Agent Roster
Dispatch agents 1–4 first (parallel or serial per rules above). Agent 5 always runs last regardless of mode.
| Agent | Focus |
|---|---|
1 code-quality |
Simplicity, readability, DRY, error handling, type safety |
2 architecture |
Design patterns, coupling, separation of concerns, API design |
3 security |
OWASP top 10, injection, auth, secrets, data exposure |
4 test-coverage |
Missing tests, edge cases, integration gaps |
5 learnings-synthesizer |
Always last — cross-reference history/learnings/, flag known patterns, suggest compounding entries |
Isolated Context Per Agent — CRITICAL
Each agent receives only:
- The git diff (or worktree diff):
git diff <base>..<head> history/<feature>/CONTEXT.mdhistory/<feature>/approach.md
Do not pass session history, implementation notes, or agent communication logs. Reviewer objectivity depends on seeing only the work product, not the implementer's thought process (Superpowers code-reviewer pattern).
See references/review-agent-prompts.md for the exact prompt for each agent.
Review Beads
Each distinct review issue becomes a bead. Per-finding markdown files are no longer the primary review artifact.
Use the bead contract from references/review-bead-template.md.
Creation rules:
- P1 → create a blocking fix bead on the current review / epic-close path
- P2 → create a non-blocking follow-up bead
- P3 → create a non-blocking follow-up bead
Non-negotiable linkage rules:
P1review beads may stay in the current epic-close path because they are blocking workP2/P3review beads must not be children of the current epicP2/P3traceability must useexternal_ref=<source-epic-id>plus labels such asreview,review-p2/review-p3, and the source reviewer label
Title pattern:
Resolve Review P1: <problem title>
Resolve Review P2: <problem title>
Resolve Review P3: <problem title>
The full review write-up lives in the bead body itself: plain-language summary, current behavior, why it matters, concrete failure scenario, evidence, proposed solutions, and acceptance criteria.
Severity Rules
| Priority | Label | Criteria | Gate |
|---|---|---|---|
| P1 | CRITICAL | Security vulns, data corruption, breaking changes | Blocks merge — always |
| P2 | IMPORTANT | Performance, architecture, reliability | Should fix before merge |
| P3 | NICE-TO-HAVE | Minor improvements, cleanup, docs | Record for future |
Calibration rule: Not everything is P1. Severity inflation wastes cycles and trains reviewers to ignore findings. When in doubt, P2.
Synthesis (After All Agents Complete)
- Collect the review beads created by agents 1-4
- Deduplicate overlapping issues
- prefer one surviving review bead per distinct problem
- close redundant duplicates with a reason such as
Duplicate of <bead-id>
- Surface
learnings-synthesizermatches with known-pattern notes on the relevant review bead - Count: N P1, N P2, N P3 review beads
- Present a summary table to user with bead IDs by severity
When presenting serious findings to the user, do not stop at terse reviewer shorthand. Translate the finding into:
- what the code does today
- why that breaks the intended behavior
- one concrete scenario showing the failure
- the smallest credible fix direction
If P1 review beads exist: HARD-GATE — stop and present. Do not proceed to Phase 2 until user acknowledges. Even in go mode, P1 is always human-gated.
Phase 2: 3-Level Artifact Verification
Goal-backward check on every artifact named in CONTEXT.md and approach.md. Task completion ≠ goal achievement — a file existing is not evidence the feature works.
Run this as a subagent with isolated context (diff + CONTEXT.md + approach.md). Use the live bead graph and bead files when you need to verify acceptance criteria coverage.
The 3 Levels
Level 1 — EXISTS: Does the file/component/route exist?
# Check example
ls src/components/PaymentForm.tsx
Level 2 — SUBSTANTIVE: Is it real, not a stub?
Scan for anti-patterns:
return null / return {} / return []
Empty handlers: onClick={() => {}}
TODO / FIXME / PLACEHOLDER comments
console.log-only implementations
API routes returning static data without DB queries
Components with state that never renders state
Level 3 — WIRED: Is it imported and used in the integration layer?
# Check example
grep -r "PaymentForm" src/pages/ src/app/
Reporting
For each artifact:
- ✅ L1+L2+L3: fully wired
- ⚠️ L1+L2 only: created but not integrated — create a
P2review bead - 🛑 L1 only (stub): exists but empty — create a
P1review bead - 🛑 Missing: not found — create a
P1review bead
Phase 3: Human UAT
Scope: Walk the user through every testable deliverable from CONTEXT.md.
Protocol:
- Extract all decisions with
SEE(visual),CALL(API), orRUN(execution) verification from CONTEXT.md - For each deliverable, present: "Does [X] work as decided in [D-id]?"
- Reference the exact decision ID so the user can verify against their original intent
- One item at a time — HARD-GATE between each
Example prompt:
UAT Item 3 of 5 — Decision D4:
"Users can reset their password via email link (D4)."
Can you navigate to /forgot-password, enter an email, and confirm the reset email arrives?
[Pass / Fail / Skip]
On failure:
- Invoke
khuym:debuggingskill → root-cause the failure - Create a fix bead:
br create "Fix: <description>" -t task -p 0 --parent <epic-id> - Execute the fix bead (invoke
khuym:executingskill) - Re-verify the specific UAT item
- Do not proceed until the item passes or user explicitly accepts the failure
On skip: Record in .khuym/STATE.md with reason. Do not count as pass.
Phase 4: Finishing
You are the last step before compounding. Close the loop completely.
Checklist
[ ] All beads in epic are closed
→ bv --robot-triage --graph-root <epic-id>
→ Any open beads? Create final fix tasks or explicitly defer with br update --defer
[ ] Final build/test/lint passes
→ Run project's standard commands (npm test / pytest / cargo test / etc.)
→ If fails: create a `P1` review bead, fix before continuing
[ ] Present merge options to user:
1. Create PR (recommended)
2. Merge directly
3. Keep branch for further work
4. Discard branch
[ ] Clean up worktree (if used)
→ git worktree remove .worktrees/<feature>
[ ] Close epic bead
→ br close <epic-id> --reason "Feature complete: <summary>"
[ ] Clear working state
→ Archive STATE.md: cp .khuym/STATE.md history/<feature>/STATE-final.md
→ Clear: echo "" > .khuym/STATE.md
Merge Options Detail
Create PR:
gh pr create \
--title "<feature title>" \
--body "## Summary\n<description>\n\n## Verified\n- [ ] All UAT items passed\n- [ ] No P1 review beads remain open\n- P2 follow-up beads: <list or 'none'>\n- P3 follow-up beads: <list or 'none'>\n\n## Review Follow-up\n- Blocking review beads: <list or 'none'>\n- Non-blocking review beads: <list or 'none'>" \
--draft # remove if ready for immediate merge
If P2 review beads exist: Include them in the PR body. Recommend fixing before merge, but user decides.
If P3 review beads exist: Add them to the PR body under "Future Work." Do not block merge.
Handoff
After Phase 4 completes:
"Feature complete. Epic [id] closed. [N] learnings flagged by learnings-synthesizer. Invoke
khuym:compoundingskill to capture patterns, decisions, and failures for future planning cycles."
Update .khuym/STATE.md:
STATUS: reviewing-complete
EPIC: <id>
HANDOFF: compounding
FLAGGED_LEARNINGS: <count> (see .khuym/findings/learnings-candidates.md)
Red Flags
Stop and surface to user immediately if you see:
- P1 findings and no user acknowledgment — never silently continue past P1
- P1 review beads created but gate not stopped — invalid; P1 still blocks
- UAT failures marked as "pass" — do not log a skip as a pass
- Artifact verification skipped — Phase 2 is not optional; stubs ship to production this way
- Epic closed with open beads — verify with
bvbefore closing learnings-synthesizerflagging a known failure pattern — this means the team already hit this problem. Surface explicitly: "Known pattern from [date]: [link]"- Agent 5 running before agents 1–4 complete — synthesis without findings is meaningless; enforce ordering
- P2/P3 review beads attached as children of the current epic — this breaks the non-blocking contract; use
external_ref+ labels instead
Files Written
.khuym/findings/
learnings-candidates.md ← Session-level compounding suggestions only
history/<feature>/
STATE-final.md ← Archived state at close
References
references/review-agent-prompts.md— Exact prompts for all 5 agents (load when dispatching)references/review-bead-template.md— Review bead format and creation contractreferences/finding-template.md— Deprecation pointer for the retired file-based finding contract
More from hoangnb24/skills
khuym:using-khuym
Bootstrap meta-skill for the khuym agentic development ecosystem. Load first on any khuym project. Lists all 9+2 skills with routing logic, session scout/bootstrap, small-change vs standard-feature vs high-risk mode selection, go mode (full-auto pipeline with 4 human gates), priority rules, and state resume. Invoke when starting a new session, choosing which skill to use, running the full pipeline end-to-end, or resuming after a handoff.
7khuym:executing
>-
6khuym:validating
|
6khuym:exploring
>-
5khuym:gkg
>-
4validating
|
1