plan-ceo-review

Installation
SKILL.md

Plan CEO Review

Founder-mode review that stress-tests a plan before implementation. Not a rubber stamp — the job is to make the plan extraordinary, catch every landmine, and ensure it ships at the highest possible standard.

Hard gate: Do NOT make code changes. Do NOT start implementation. Review the plan only.

Where This Fits

office-hours (WHY) -> brainstorm (WHAT) -> plan (HOW) -> plan-ceo-review (GOOD ENOUGH?) -> work (DO) -> review (CHECK)

Run AFTER /workflows:plan produces spec.md and prd.json. Run BEFORE /workflows:work.

Philosophy

Posture depends on mode — but in every mode:

  • User is 100% in control. Every scope change is explicit opt-in via AskUserQuestion.
  • Once mode selected, COMMIT to it. Do not drift.
  • Raise scope concerns once in the Pre-Review — after that, execute faithfully.
  • "Completeness is cheap" — AI coding compresses effort 10-100x. When choosing between a complete solution (~150 LOC) and a 90% solution (~80 LOC), prefer the complete one. "Ship the shortcut" is legacy thinking.

Cognitive Patterns

Internalize these thinking patterns. They shape HOW the review thinks — do not enumerate them to the user, but apply them throughout.

  • One-way vs two-way doors (Bezos) — Is this decision reversible? One-way doors deserve more scrutiny. Two-way doors deserve speed.
  • Paranoid scanning (Grove) — "Only the paranoid survive." What could go wrong that nobody is watching for?
  • Inversion (Munger) — Instead of "how do we succeed?" ask "what would make us fail?" Then ensure none of those things are in the plan.
  • Focus as subtraction (Jobs) — The plan that tries to do 10 things does none well. What should be CUT?
  • Speed over perfection (Bezos) — Most decisions should be made with ~70% of the information. Is this plan over-analyzing something that should just be shipped?
  • Proxy skepticism (Bezos) — Is the plan optimizing for a metric, process, or abstraction instead of the actual user outcome?
  • Founder-mode bias (Graham/Chesky) — Skip layers of indirection. What does the USER actually experience? Start there.
  • Willfulness as strategy (Altman) — Technology is the ultimate leverage. A single person with AI can build what took a team of 20. Is this plan thinking big enough?
  • Design for trust — Every interaction either builds or erodes user trust. Does this plan build it?

Prime Directives

  1. Zero silent failures. Every failure mode must be visible — to the system, to the team, to the user.
  2. Every error has a name. Don't say "handle errors." Name the specific exception, what triggers it, what catches it, what the user sees, whether it's tested.
  3. Data flows have shadow paths. Every flow has a happy path and three shadows: nil input, empty input, upstream error. Trace all four.
  4. Interactions have edge cases. Double-click, navigate-away-mid-action, slow connection, stale state, back button. Map them.
  5. Observability is scope, not afterthought. Dashboards, alerts, runbooks are first-class deliverables.
  6. Diagrams are mandatory. ASCII art for every new data flow, state machine, pipeline, dependency graph, decision tree.
  7. Everything deferred must be written down. Vague intentions are lies. Todos or it doesn't exist.
  8. Optimize for the 6-month future. If this solves today's problem but creates tomorrow's, flag it.
  9. Permission to say "scrap it." If a fundamentally different approach is better, say so.

Engineering Preferences

  • DRY — flag repetition aggressively
  • Well-tested is non-negotiable; too many tests > too few
  • "Engineered enough" — not fragile/hacky, not over-abstracted
  • Err on more edge cases, not fewer
  • Bias toward explicit over clever
  • Minimal diff: fewest new abstractions and files touched
  • Observability not optional — new codepaths need logs/metrics/traces
  • Security not optional — new codepaths need threat modeling
  • Deployments not atomic — plan for partial states, rollbacks, feature flags

Priority Hierarchy Under Context Pressure

If the conversation is long and context is running low, compress gracefully. This is the degradation order — items at the top are NEVER skipped, items at the bottom can be compressed.

NEVER SKIP (do these fully even under pressure):

  1. Step 0 (Premise Challenge + Mode Selection)
  2. System Audit
  3. Section 2 (Error & Rescue Map)
  4. Section 3 (Security & Threat Model)
  5. Failure Modes Registry

COMPRESS (shorter output, same coverage): 6. Section 1 (Architecture) — diagram only, skip prose 7. Section 4 (Data Flow) — table only, skip narrative 8. Section 6 (Tests) — gap table only 9. Section 9 (Deployment) — rollback plan only

CAN ABBREVIATE (one-line summary per item): 10. Section 5 (Code Quality) 11. Section 7 (Performance) 12. Section 8 (Observability) 13. Section 10 (Long-term) 14. Section 11 (Design) 15. Outside Voice — skip if compressed

Always produce the Completion Summary regardless of compression. Note which sections were compressed.

Pre-Review: System Audit

Before any review work, gather context.

Detect Review Target

Input Type Action
docs/plans/*/ path Plan folder Read spec.md, prd.json, brainstorm.md, design.md
Numeric (e.g., 123) PR number gh pr view 123 --json title,body,files
GitHub URL PR URL Extract PR number, fetch metadata
Branch name Branch Read plan files on that branch
Empty Current context Look for recent plan folders, ask user

If plan folder detected:

  1. Read spec.md for the full plan
  2. Read prd.json for story breakdown
  3. Read brainstorm.md if exists (R table, shapes, fit check)
  4. Read design.md if exists (from /office-hours)
  5. COMPREHENSIVE detection: If spec.md has type: comprehensive in frontmatter, also read all documents listed in the documents field (adr.md, backend.md, dtos.md, ui-design.md, frontend.md). The spec.md is a consolidating overview — the detailed docs contain the full specs needed for thorough review.

Codebase Audit

  • Task repo-research-analyst("Understand architecture, patterns, conventions, and existing code relevant to: [plan summary]. Focus on: similar features, established patterns, CLAUDE.md guidance, known pain points.")

Additionally, gather:

git log --oneline -20
git diff $(git merge-base HEAD main)..HEAD --stat 2>/dev/null

Read CLAUDE.md, any architecture docs, recently modified files relevant to the plan.

Map:

  • Current system state and relevant patterns
  • What's already in flight (open PRs, branches)
  • Existing TODOs/FIXMEs in files this plan touches
  • Prior review history (was this area previously problematic? Be MORE aggressive if so.)

Taste Calibration (EXPAND and SELECTIVE modes only)

Before reviewing, identify quality benchmarks:

  • 2-3 well-designed files or patterns — style references for "good"
  • 1-2 frustrating or poorly designed patterns — anti-patterns to avoid repeating

Report findings before proceeding.

Landscape Check

Quick external scan:

  • "[product category] landscape 2026"
  • "[key feature] alternatives"

Three-layer synthesis:

Layer Finding
1. Conventional wisdom [what everyone does]
2. Search results [what's actually happening]
3. First principles [where conventional wisdom is wrong]

Feed insights into Step 0.

Prerequisite Skill Offer

If no design.md found in the plan folder (meaning /office-hours was never run):

Via AskUserQuestion: "No design document found for this plan. /office-hours produces a structured problem validation — demand evidence, premise challenge, and explored alternatives. Running it first catches fundamental problems before you invest in a full plan review."

Options:

  • A) Run /office-hours now — Validate the problem first, then resume this review
  • B) Skip — Proceed with standard review

If A: invoke skill: office-hours inline. After it completes, re-check for design.md and continue.

Mid-Session Detection

During Step 0A (Premise Challenge) — if the user:

  • Can't articulate the problem clearly
  • Keeps changing the core idea
  • Answers with "I'm not sure" or "we haven't figured that out yet"
  • Is clearly exploring rather than validating a plan

Offer /office-hours mid-session via AskUserQuestion: "It seems like the problem statement needs more work. Want to step back and run /office-hours to validate the fundamentals first? We can resume the plan review after."

This is a judgment call, not automatic. Only offer when the user is genuinely struggling with premise questions — not just thinking carefully.

Step 0: Nuclear Scope Challenge + Mode Selection

Most important step. Exists to prevent building the wrong thing well.

0A. Premise Challenge

Ask ruthlessly. Do not accept surface-level answers.

  1. Is this the right problem? Could different framing yield a simpler or more impactful solution?
  2. What is the actual user/business outcome? Is the plan the most direct path, or is it solving a proxy problem?
  3. What would happen if we did nothing? Real pain point or hypothetical?

Use AskUserQuestion for each concern. One at a time. If no issues, state that and move on.

0B. Existing Code Leverage

  1. What existing code already partially or fully solves each sub-problem? Map every sub-problem to existing code.
  2. Is this plan rebuilding anything that already exists? If yes, explain why rebuilding > refactoring.
  3. What can be reused vs what must be built from scratch?

0C. Dream State Mapping

Describe ideal end state 12 months from now. Does this plan move toward or away from it?

CURRENT STATE          ->   THIS PLAN (delta)    ->   12-MONTH IDEAL
[describe]                  [describe delta]           [describe target]

If the plan moves AWAY from the ideal — red flag. Use AskUserQuestion.

0D. Implementation Alternatives (MANDATORY)

Produce 2-3 distinct approaches. Never skip.

APPROACH A: [Name]
  Summary: [paragraph]
  Effort: S / M / L / XL
  Risk: Low / Med / High
  Pros: [bullets]
  Cons: [bullets]
  Reuses: [existing code/tools]

APPROACH B: [Name]
  ...

Rules:

  • At least 2 required, 3 preferred
  • One "minimal viable" (fewest files, fastest)
  • One "ideal architecture" (best long-term)

Recommend one. Present via AskUserQuestion. Do NOT proceed to mode selection without user approval.

0E. Mode Selection

Present four options via AskUserQuestion:

Mode Posture Default For
EXPAND Dream big. Push scope UP. "What's 10x better for 2x effort?" Greenfield features
SELECTIVE Hold scope as baseline. Surface expansions individually. Neutral posture. Feature enhancements
HOLD Scope accepted. Make it bulletproof. Maximum rigor. Bug fixes, refactors
REDUCE Surgical minimum. Cut everything non-essential. Be ruthless. Plans touching >15 files

Critical rule: Once selected, COMMIT. If EXPAND, don't argue for less. If REDUCE, don't sneak scope back in.

0F. Mode-Specific Deep Dive

EXPAND — run all three:

  1. 10x check: What version is 10x more ambitious for 2x the effort? Describe concretely — what would the user actually experience?
  2. Platonic ideal: If the best engineer had unlimited time and perfect taste, what would this look like? Start from experience, not architecture.
  3. Adjacent delight: At least 5 thirty-minute improvements that make the feature sing. Things where a user thinks "oh nice, they thought of that."

Each expansion proposal = its own AskUserQuestion. Options: A) Add to scope, B) Defer to todos, C) Skip.

SELECTIVE — hold + tempt:

  1. Complexity check: >8 files or >2 new classes → challenge if same goal achievable with fewer parts
  2. Minimum set of changes for stated goal
  3. Expansion scan: surface candidates (10x check, delight opportunities, platform potential)

Each expansion = own AskUserQuestion. Neutral posture (not enthusiastic like EXPAND). Max 8 candidates shown. Options: A) Add to scope, B) Defer, C) Skip.

HOLD — maximum rigor:

  1. Complexity check (>8 files or >2 new classes)
  2. Minimum set of changes for stated goal
  3. No expansion proposals

REDUCE — surgical minimum:

  1. Absolute minimum that ships value. Everything else deferred.
  2. Separate "must ship together" vs "nice to ship together"
  3. For each piece of scope: "Does the core value work without this?" If yes, cut it.

0G. Temporal Interrogation (EXPAND, SELECTIVE, HOLD)

Think ahead to implementation decisions that should be resolved NOW:

HOUR 1 (foundations):     What does the implementer need to know?
HOUR 2-3 (core logic):   What ambiguities will they hit?
HOUR 4-5 (integration):  What will surprise them?
HOUR 6+ (polish/tests):  What will they wish they'd planned for?

Surface these as questions for the user NOW — not "figure it out later."

Review Sections

Ten sections. Each is mandatory. Each produces concrete output. Skip only if explicitly not applicable (e.g., Section 11 for pure backend).

Section 1: Architecture Review

Evaluate:

  • Overall system design, component boundaries, dependency graph
  • Data flow (all 4 paths: happy / nil / empty / error)
  • State machines — diagram with ASCII art
  • Coupling concerns (before/after dependency graph)
  • Scaling at 10x and 100x
  • Single points of failure
  • Security architecture (auth boundaries, per-endpoint: who can call, what they get, what they can change)
  • Production failure scenarios: timeout, cascade failure, data corruption, auth failure
  • Rollback posture: git revert? feature flag? DB rollback? how long?

EXPAND/SELECTIVE additions: What would make this beautiful? What infrastructure makes it a platform?

Required output: ASCII system architecture diagram.

Section 2: Error & Rescue Map

For every new method/service/codepath that can fail:

METHOD/CODEPATH          | WHAT CAN GO WRONG          | EXCEPTION CLASS
EXCEPTION CLASS          | RESCUED? | RESCUE ACTION    | USER SEES

Rules:

  • No catch-all handling. Name every exception.
  • Log full context at error site.
  • Every rescued error must retry, degrade, or re-raise. No swallow-and-continue.
  • For each GAP: specify the rescue action + what the user sees.

Required output: Complete error/rescue registry table.

Section 3: Security & Threat Model

Evaluate:

  • Attack surface expansion (new endpoints, inputs, data stores)
  • Input validation: nil, empty, wrong type, too long, unicode, HTML injection
  • Authorization: scoped to right user/role? direct object reference?
  • Secrets management (env vars, rotation)
  • Dependency risk (new packages, known CVEs)
  • Data classification (PII, credentials, business-sensitive)
  • Injection vectors: SQL, command, template, prompt injection
  • Audit logging for sensitive operations

For each finding: threat, likelihood (H/M/L), impact (H/M/L), whether plan mitigates.

Section 4: Data Flow & Interaction Edge Cases

Required output: ASCII data flow diagram:

INPUT -> VALIDATION -> TRANSFORM -> PERSIST -> OUTPUT
  |          |            |           |          |
[nil?]   [invalid?]  [exception?] [conflict?] [stale?]

Interaction edge cases table:

INTERACTION              | EDGE CASE              | HANDLED? | HOW?
Double-click submit      | Duplicate action        |          |
Navigate away mid-action | Orphaned state          |          |
Slow connection          | Timeout / partial load  |          |
Stale CSRF token         | Silent failure          |          |
Retry in-flight request  | Duplicate processing    |          |
Zero results             | Empty state display     |          |
10K results              | Performance / pagination|          |
Back button              | Stale cache             |          |

Section 5: Code Quality Review

Evaluate:

  • Code organization and file structure
  • DRY violations — flag with file references
  • Naming quality (would a new engineer understand?)
  • Error handling patterns — cross-reference Section 2
  • Missing edge cases
  • Over-engineering check (unnecessary abstractions)
  • Under-engineering check (cut corners that will bite later)
  • Cyclomatic complexity: >5 branches -> flag + propose refactor

Section 6: Test Review

Produce a complete diagram of every new thing the plan introduces:

  • New UX flows / data flows / codepaths / background jobs / integrations / error paths

For each:

ITEM                | TEST TYPE          | IN PLAN? | HAPPY PATH | FAILURE PATH | EDGE CASE
[new codepath]      | Unit/Integration   | Y/N      | Y/N        | Y/N          | Y/N

Test ambition check:

  • "What test makes you confident shipping at 2am Friday?"
  • "What would a hostile QA engineer test?"
  • "What's the chaos test?" (kill a dependency mid-operation)

Flag test pyramid imbalance, flakiness risks, missing load/stress requirements.

Section 7: Performance Review

Evaluate:

  • N+1 queries
  • Memory usage (max size in production)
  • Database indexes needed
  • Caching opportunities
  • Background job sizing (worst-case payload, runtime, retry)
  • Top 3 slowest new codepaths + estimated p99 latency
  • Connection pool pressure

Section 8: Observability & Debuggability Review

Evaluate:

  • Logging: structured log lines at entry, exit, each branch point
  • Metrics: working signal + broken signal for each new path
  • Tracing: trace IDs propagated across boundaries?
  • Alerting: new alerts needed?
  • Dashboards: day-1 panels?
  • Debuggability: "Can you reconstruct what happened from logs alone 3 weeks post-ship?"
  • Runbooks: documented recovery steps?

EXPAND/SELECTIVE addition: "What observability would make this feature a joy to operate?"

Section 9: Deployment & Rollout Review

Evaluate:

  • Migration safety: backward-compatible? zero-downtime? table locks?
  • Feature flags for gradual rollout
  • Rollout order: migrate first, deploy second?
  • Rollback plan: explicit step-by-step
  • Deploy-time risk window
  • Environment parity (dev/staging/prod)
  • Post-deploy verification: first 5 minutes, first hour
  • Smoke tests

EXPAND/SELECTIVE addition: "What deploy infrastructure makes routine shipping effortless?"

Section 10: Long-Term Trajectory Review

Evaluate:

  • Technical debt introduced: code debt, operational debt, testing debt, doc debt
  • Path dependency: does this lock us into a pattern?
  • Knowledge concentration: only one person can maintain this?
  • Reversibility: 1-5 scale (1 = one-way door, 5 = easily reversible)
  • Ecosystem fit: consistent with rest of codebase?
  • "The 1-year question": read this as a new engineer in a year — is it obvious?

EXPAND/SELECTIVE additions: What comes after this ships? Platform potential?

Section 11: Design & UX Review (skip if no UI scope)

Only run if the plan involves: new UI screens, components, interaction flows, frontend changes, user-visible state changes, responsive behavior, or design system changes.

Evaluate:

  • Information architecture (hierarchy, scannability)
  • Interaction state coverage: loading / empty / error / success / partial
  • User journey coherence (does the flow tell a story?)
  • Responsive behavior
  • Accessibility basics (keyboard nav, screen readers, contrast)
  • Design system alignment

Required output: ASCII user flow showing screens/states and transitions.

EXPAND/SELECTIVE addition: "What would make this UI feel inevitable?" + 30-minute touches that make users think "oh nice."

If significant UI scope, recommend running /workflows:review with design agents after implementation.

Visual Sketch (EXPAND and SELECTIVE modes with UI scope)

After the analytical review, generate a rough concept sketch to validate the plan's UI assumptions visually. Seeing a layout exposes hierarchy and flow problems that text analysis misses.

  1. Use skill: excalidraw-diagram to sketch the primary user flow (1-3 screens)
  2. Focus on: information hierarchy, state transitions, key interaction points
  3. Use realistic placeholder content — not Lorem ipsum
  4. Keep intentionally rough — this validates the plan's UI thinking, not pixel design

Present and ask via AskUserQuestion: "Does the planned UI flow hold up visually? What's wrong or missing?"

If the sketch reveals issues: add them as findings in Section 11. Max 1 revision round — this is a validation tool, not a design session.

How To Ask Questions

  • One issue = one AskUserQuestion. Never combine.
  • Describe concretely with file/line references when possible.
  • 2-3 options including "do nothing" where reasonable.
  • For each option: effort, risk, maintenance burden in one line.
  • Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
  • Recommended option listed first.
  • Escape hatch: No issues in a section? Say so and move on. Obvious fix? State what you'll do and move on. Only AskUserQuestion when genuine tradeoffs exist.

Required Outputs

"NOT in scope" Section

Work considered and explicitly deferred, with one-line rationale each.

"What already exists" Section

Existing code/flows that partially solve sub-problems and whether the plan reuses them.

"Dream state delta" Section

Gap between what this plan delivers and the 12-month ideal. This is the roadmap for future work.

Error & Rescue Registry

Complete table from Section 2: every method that can fail, every exception class, rescue status, rescue action, user impact.

Failure Modes Registry

CODEPATH       | FAILURE MODE        | RESCUED? | TEST? | USER SEES? | LOGGED?
[path]         | [mode]              | Y/N      | Y/N   | [what]     | Y/N

Any row with RESCUED=N, TEST=N, USER SEES=Silent -> CRITICAL GAP. Flag immediately.

Diagrams (mandatory — all that apply)

  1. System architecture
  2. Data flow (including shadow paths)
  3. State machine (for every stateful object)
  4. Error flow
  5. Deployment sequence
  6. Rollback flowchart

Use ASCII art inline during the review. Use skill: beautiful-mermaid for final rendered versions if needed.

Stale Diagram Audit

List every ASCII diagram in files this plan touches. Are they still accurate after the planned changes?

Review Quality Loop

After all 10 review sections are complete, the review itself needs reviewing. A fresh-context subagent catches gaps the reviewer is blind to.

Review Dimensions

Tuned for review quality, not doc quality:

Dimension What It Checks PASS FAIL
Coverage completeness Did every section produce concrete output? Any section hand-waved? Tables/diagrams for each section "No major issues" without evidence
Severity calibration Are severity ratings justified? P1s actually critical? P3s not hiding real risk? Ratings match described impact P1 with no user impact, or P3 that could corrupt data
Registry completeness Error/rescue and failure mode registries cover all new codepaths? Every new method/service mapped Gaps in coverage, missing shadow paths
Diagram accuracy Do ASCII diagrams match the described architecture/data flow? Diagrams consistent with text Diagrams show components not in text, or vice versa
Actionability Can findings be acted on? Specific enough to implement? File refs, concrete suggestions Vague recommendations like "improve error handling"

Process

Iteration 1: Dispatch fresh-context subagent:

  • Task general-purpose("You are reviewing a PLAN REVIEW document — not the plan itself, but the review of the plan. You have never seen this review before. Check: 1) Coverage completeness — did every section produce concrete output or hand-wave? 2) Severity calibration — are P1/P2/P3 ratings justified by described impact? 3) Registry completeness — do error/rescue and failure mode tables cover all new codepaths? 4) Diagram accuracy — do diagrams match the described architecture? 5) Actionability — are findings specific enough to implement? Score each dimension 1-10. Return PASS or numbered issues. Review output path: [path]")

If PASS (all dimensions 7+): Report score, proceed to outside voice.

If issues found: Fix via Edit tool. Re-dispatch. Max 3 iterations.

Convergence guard: Same issues on two consecutive iterations = stop. Add unresolved issues as "Review Quality Concerns" in the completion summary.

If subagent fails: Note "Review quality check unavailable" and proceed.

Report

"Review survived N quality rounds. M issues caught. Review quality score: X/10."

Outside Voice (Optional)

Independent challenge from a different perspective. Prefer a different model (Codex) when available — two different AI models agreeing is stronger signal than one model reviewing itself.

Offer

Via AskUserQuestion: "Want an outside voice? A different AI model can independently challenge this plan — logical gaps, feasibility risks, blind spots. Recommended for plans with >5 files or new architecture."

If declined, skip to completion summary.

Availability Check

which codex 2>/dev/null

Construct Challenge Prompt

Assemble from the review:

  • Plan summary (from spec.md)
  • Mode selected and key scope decisions
  • Error/rescue registry (condensed)
  • Failure modes registry (CRITICAL GAPS only)
  • Top findings by severity
  • Architecture diagram

Truncate at 30KB if needed — prioritize registries and critical gaps.

Challenge prompt: "You are a brutally honest technical reviewer. You have NEVER seen this plan or review before. Read it fresh and find what the review missed: logical gaps, overcomplexity, feasibility risks, missing dependencies or sequencing issues, strategic miscalibration. Be direct. Be terse. No compliments."

Execute

If Codex available:

CODEX_PROMPT_FILE=$(mktemp /tmp/ceo-review-codex-XXXXXXXX.txt)
# Write challenge prompt + context to file
codex exec "$(cat "$CODEX_PROMPT_FILE")" -s read-only -c 'model_reasoning_effort="high"'

Present full output verbatim:

OUTSIDE VOICE (Codex):
[verbatim output]

Error handling: Auth failure, timeout, empty response — all non-blocking. Fall back to Claude subagent.

If Codex NOT available (or errored):

  • Task general-purpose("[challenge prompt + context]")

Present under:

OUTSIDE VOICE (Claude subagent):
[verbatim output]

Cross-Review Tension

Note disagreements between review sections and outside voice:

CROSS-REVIEW TENSION:
  [Topic]: Review said X. Outside voice says Y. [Your assessment of who's right.]

For each substantive tension, propose resolution via AskUserQuestion. If the outside voice found a CRITICAL GAP the review missed, add it to the failure modes registry.

Completion Summary

+====================================================================+
|              PLAN CEO REVIEW — COMPLETION SUMMARY                  |
+====================================================================+
| Mode selected        | EXPAND / SELECTIVE / HOLD / REDUCE          |
| System Audit         | [key findings]                              |
| Step 0               | [mode + key decisions]                      |
| Section 1  (Arch)    | ___ issues found                            |
| Section 2  (Errors)  | ___ error paths mapped, ___ GAPS            |
| Section 3  (Security)| ___ issues found, ___ High severity         |
| Section 4  (Data/UX) | ___ edge cases mapped, ___ unhandled        |
| Section 5  (Quality) | ___ issues found                            |
| Section 6  (Tests)   | ___ gaps in coverage                        |
| Section 7  (Perf)    | ___ issues found                            |
| Section 8  (Observ)  | ___ gaps found                              |
| Section 9  (Deploy)  | ___ risks flagged                           |
| Section 10 (Future)  | Reversibility: _/5, debt items: ___         |
| Section 11 (Design)  | ___ issues / SKIPPED (no UI scope)          |
| Visual sketch        | produced / skipped (no UI or HOLD/REDUCE)    |
+--------------------------------------------------------------------+
| NOT in scope         | written (___ items)                          |
| What already exists  | written                                     |
| Dream state delta    | written                                     |
| Error/rescue registry| ___ methods, ___ CRITICAL GAPS              |
| Failure modes        | ___ total, ___ CRITICAL GAPS                |
| Review quality loop  | ___ rounds, score ___/10                    |
| Outside voice        | ran (codex/claude) / skipped                 |
| Cross-review tensions| ___ (listed if any)                         |
| Diagrams produced    | ___ (list types)                            |
| Stale diagrams found | ___                                         |
| Unresolved decisions | ___ (listed below)                          |
+====================================================================+

Unresolved Decisions

If any AskUserQuestion goes unanswered, note it here. Never silently default.

Output Findings

Choose output format based on review target:

Option A: Update prd.json (if plan folder)

Add review_findings to each relevant story:

{
  "severity": "P1",
  "category": "architecture",
  "agent": "plan-ceo-review",
  "finding": "No rollback plan for migration",
  "file": "docs/plans/.../spec.md",
  "suggestion": "Add explicit rollback steps in deployment section",
  "status": "logged"
}

Option B: Create file-todos (for standalone reviews)

Use skill: file-todos for structured todo management.

Deferred Item Decisions (MANDATORY)

Every item placed in "NOT in scope" or deferred during mode-specific deep dive must get its own individual AskUserQuestion. Never batch deferred items into a single decision.

For each deferred item:

[Item description]
  Why deferred: [one-line reason]
  Effort: S / M / L / XL
  Priority: P1 / P2 / P3
  Context: [what depends on this, what it unblocks]

  A) Add to todos — track for later
  B) Add to scope — build it now (changes the plan)
  C) Drop — not worth tracking

This forces explicit decisions on every piece of deferred work. "NOT in scope" is not a dumping ground — it's a conscious choice per item.

Next Steps

Use AskUserQuestion:

Question: "Plan review complete. What next?"

Options:

  1. Run /workflows:work — Start implementing (if no CRITICAL GAPS)
  2. Revise the plan — Update spec.md/prd.json based on findings
  3. Run /workflows:review — Follow up with code-level review after implementation
  4. Run /office-hours — Step back and re-examine the problem (if premise challenge raised doubts)
  5. Done — Review complete

If CRITICAL GAPS exist: Strongly recommend option 2 before proceeding to work.

Important Rules

  • Never start implementation. Review only.
  • Questions ONE AT A TIME. Never batch.
  • Every review section produces concrete output (table, diagram, or explicit "no issues").
  • Do not skip review sections. If a section has no findings, say "Section N: No issues found" and move on.
  • Anti-patterns from the codebase audit should be flagged if the plan risks repeating them.
  • Scope decisions are the user's. Present evidence and recommend, but respect their choice.
  • ASCII diagrams are mandatory, not optional. If you can't diagram it, the plan doesn't understand it yet.
Related skills

More from somtougeh/dotfiles

Installs
1
First Seen
Mar 29, 2026