plan-ceo-review

Installation

SKILL.md

Plan CEO Review

Founder-mode review that stress-tests a plan before implementation. Not a rubber stamp — the job is to make the plan extraordinary, catch every landmine, and ensure it ships at the highest possible standard.

Hard gate: Do NOT make code changes. Do NOT start implementation. Review the plan only.

Where This Fits

office-hours (WHY) -> brainstorm (WHAT) -> plan (HOW) -> plan-ceo-review (GOOD ENOUGH?) -> work (DO) -> review (CHECK)

Run AFTER /workflows:plan produces spec.md and prd.json. Run BEFORE /workflows:work.

Philosophy

Posture depends on mode — but in every mode:

User is 100% in control. Every scope change is explicit opt-in via AskUserQuestion.
Once mode selected, COMMIT to it. Do not drift.
Raise scope concerns once in the Pre-Review — after that, execute faithfully.
"Completeness is cheap" — AI coding compresses effort 10-100x. When choosing between a complete solution (~150 LOC) and a 90% solution (~80 LOC), prefer the complete one. "Ship the shortcut" is legacy thinking.

Cognitive Patterns

Internalize these thinking patterns. They shape HOW the review thinks — do not enumerate them to the user, but apply them throughout.

One-way vs two-way doors (Bezos) — Is this decision reversible? One-way doors deserve more scrutiny. Two-way doors deserve speed.
Paranoid scanning (Grove) — "Only the paranoid survive." What could go wrong that nobody is watching for?
Inversion (Munger) — Instead of "how do we succeed?" ask "what would make us fail?" Then ensure none of those things are in the plan.
Focus as subtraction (Jobs) — The plan that tries to do 10 things does none well. What should be CUT?
Speed over perfection (Bezos) — Most decisions should be made with ~70% of the information. Is this plan over-analyzing something that should just be shipped?
Proxy skepticism (Bezos) — Is the plan optimizing for a metric, process, or abstraction instead of the actual user outcome?
Founder-mode bias (Graham/Chesky) — Skip layers of indirection. What does the USER actually experience? Start there.
Willfulness as strategy (Altman) — Technology is the ultimate leverage. A single person with AI can build what took a team of 20. Is this plan thinking big enough?
Design for trust — Every interaction either builds or erodes user trust. Does this plan build it?

Prime Directives

Zero silent failures. Every failure mode must be visible — to the system, to the team, to the user.
Every error has a name. Don't say "handle errors." Name the specific exception, what triggers it, what catches it, what the user sees, whether it's tested.
Data flows have shadow paths. Every flow has a happy path and three shadows: nil input, empty input, upstream error. Trace all four.
Interactions have edge cases. Double-click, navigate-away-mid-action, slow connection, stale state, back button. Map them.
Observability is scope, not afterthought. Dashboards, alerts, runbooks are first-class deliverables.
Diagrams are mandatory. ASCII art for every new data flow, state machine, pipeline, dependency graph, decision tree.
Everything deferred must be written down. Vague intentions are lies. Todos or it doesn't exist.
Optimize for the 6-month future. If this solves today's problem but creates tomorrow's, flag it.
Permission to say "scrap it." If a fundamentally different approach is better, say so.

Engineering Preferences

DRY — flag repetition aggressively
Well-tested is non-negotiable; too many tests > too few
"Engineered enough" — not fragile/hacky, not over-abstracted
Err on more edge cases, not fewer
Bias toward explicit over clever
Minimal diff: fewest new abstractions and files touched
Observability not optional — new codepaths need logs/metrics/traces
Security not optional — new codepaths need threat modeling
Deployments not atomic — plan for partial states, rollbacks, feature flags

Priority Hierarchy Under Context Pressure

If the conversation is long and context is running low, compress gracefully. This is the degradation order — items at the top are NEVER skipped, items at the bottom can be compressed.

NEVER SKIP (do these fully even under pressure):

Step 0 (Premise Challenge + Mode Selection)
System Audit
Section 2 (Error & Rescue Map)
Section 3 (Security & Threat Model)
Failure Modes Registry

COMPRESS (shorter output, same coverage): 6. Section 1 (Architecture) — diagram only, skip prose 7. Section 4 (Data Flow) — table only, skip narrative 8. Section 6 (Tests) — gap table only 9. Section 9 (Deployment) — rollback plan only

CAN ABBREVIATE (one-line summary per item): 10. Section 5 (Code Quality) 11. Section 7 (Performance) 12. Section 8 (Observability) 13. Section 10 (Long-term) 14. Section 11 (Design) 15. Outside Voice — skip if compressed

Always produce the Completion Summary regardless of compression. Note which sections were compressed.

Pre-Review: System Audit

Before any review work, gather context.

Detect Review Target

Input	Type	Action
`docs/plans/*/` path	Plan folder	Read spec.md, prd.json, brainstorm.md, design.md
Numeric (e.g., `123`)	PR number	`gh pr view 123 --json title,body,files`
GitHub URL	PR URL	Extract PR number, fetch metadata
Branch name	Branch	Read plan files on that branch
Empty	Current context	Look for recent plan folders, ask user

If plan folder detected:

Read spec.md for the full plan
Read prd.json for story breakdown
Read brainstorm.md if exists (R table, shapes, fit check)
Read design.md if exists (from /office-hours)
COMPREHENSIVE detection: If spec.md has type: comprehensive in frontmatter, also read all documents listed in the documents field (adr.md, backend.md, dtos.md, ui-design.md, frontend.md). The spec.md is a consolidating overview — the detailed docs contain the full specs needed for thorough review.

Codebase Audit

Task repo-research-analyst("Understand architecture, patterns, conventions, and existing code relevant to: [plan summary]. Focus on: similar features, established patterns, CLAUDE.md guidance, known pain points.")

Additionally, gather:

git log --oneline -20
git diff $(git merge-base HEAD main)..HEAD --stat 2>/dev/null

Read CLAUDE.md, any architecture docs, recently modified files relevant to the plan.

Map:

Current system state and relevant patterns
What's already in flight (open PRs, branches)
Existing TODOs/FIXMEs in files this plan touches
Prior review history (was this area previously problematic? Be MORE aggressive if so.)

Taste Calibration (EXPAND and SELECTIVE modes only)

Before reviewing, identify quality benchmarks:

2-3 well-designed files or patterns — style references for "good"
1-2 frustrating or poorly designed patterns — anti-patterns to avoid repeating

Report findings before proceeding.

Landscape Check

Quick external scan:

"[product category] landscape 2026"
"[key feature] alternatives"

Three-layer synthesis:

Layer	Finding
1. Conventional wisdom	[what everyone does]
2. Search results	[what's actually happening]
3. First principles	[where conventional wisdom is wrong]

Feed insights into Step 0.

Prerequisite Skill Offer

If no design.md found in the plan folder (meaning /office-hours was never run):

Via AskUserQuestion: "No design document found for this plan. /office-hours produces a structured problem validation — demand evidence, premise challenge, and explored alternatives. Running it first catches fundamental problems before you invest in a full plan review."

Options:

A) Run /office-hours now — Validate the problem first, then resume this review
B) Skip — Proceed with standard review

If A: invoke skill: office-hours inline. After it completes, re-check for design.md and continue.

Mid-Session Detection

During Step 0A (Premise Challenge) — if the user:

Can't articulate the problem clearly
Keeps changing the core idea
Answers with "I'm not sure" or "we haven't figured that out yet"
Is clearly exploring rather than validating a plan

Offer /office-hours mid-session via AskUserQuestion: "It seems like the problem statement needs more work. Want to step back and run /office-hours to validate the fundamentals first? We can resume the plan review after."

This is a judgment call, not automatic. Only offer when the user is genuinely struggling with premise questions — not just thinking carefully.

Step 0: Nuclear Scope Challenge + Mode Selection

Most important step. Exists to prevent building the wrong thing well.

0A. Premise Challenge

Ask ruthlessly. Do not accept surface-level answers.

Is this the right problem? Could different framing yield a simpler or more impactful solution?
What is the actual user/business outcome? Is the plan the most direct path, or is it solving a proxy problem?
What would happen if we did nothing? Real pain point or hypothetical?

Use AskUserQuestion for each concern. One at a time. If no issues, state that and move on.

0B. Existing Code Leverage

What existing code already partially or fully solves each sub-problem? Map every sub-problem to existing code.
Is this plan rebuilding anything that already exists? If yes, explain why rebuilding > refactoring.
What can be reused vs what must be built from scratch?

0C. Dream State Mapping

Describe ideal end state 12 months from now. Does this plan move toward or away from it?

CURRENT STATE          ->   THIS PLAN (delta)    ->   12-MONTH IDEAL
[describe]                  [describe delta]           [describe target]

If the plan moves AWAY from the ideal — red flag. Use AskUserQuestion.

0D. Implementation Alternatives (MANDATORY)

Produce 2-3 distinct approaches. Never skip.

APPROACH A: [Name]
  Summary: [paragraph]
  Effort: S / M / L / XL
  Risk: Low / Med / High
  Pros: [bullets]
  Cons: [bullets]
  Reuses: [existing code/tools]

APPROACH B: [Name]
  ...

Rules:

At least 2 required, 3 preferred
One "minimal viable" (fewest files, fastest)
One "ideal architecture" (best long-term)

Recommend one. Present via AskUserQuestion. Do NOT proceed to mode selection without user approval.

0E. Mode Selection

Present four options via AskUserQuestion:

Mode	Posture	Default For
EXPAND	Dream big. Push scope UP. "What's 10x better for 2x effort?"	Greenfield features
SELECTIVE	Hold scope as baseline. Surface expansions individually. Neutral posture.	Feature enhancements
HOLD	Scope accepted. Make it bulletproof. Maximum rigor.	Bug fixes, refactors
REDUCE	Surgical minimum. Cut everything non-essential. Be ruthless.	Plans touching >15 files

Critical rule: Once selected, COMMIT. If EXPAND, don't argue for less. If REDUCE, don't sneak scope back in.

0F. Mode-Specific Deep Dive

EXPAND — run all three:

10x check: What version is 10x more ambitious for 2x the effort? Describe concretely — what would the user actually experience?
Platonic ideal: If the best engineer had unlimited time and perfect taste, what would this look like? Start from experience, not architecture.
Adjacent delight: At least 5 thirty-minute improvements that make the feature sing. Things where a user thinks "oh nice, they thought of that."

Each expansion proposal = its own AskUserQuestion. Options: A) Add to scope, B) Defer to todos, C) Skip.

SELECTIVE — hold + tempt:

Complexity check: >8 files or >2 new classes → challenge if same goal achievable with fewer parts
Minimum set of changes for stated goal
Expansion scan: surface candidates (10x check, delight opportunities, platform potential)

Each expansion = own AskUserQuestion. Neutral posture (not enthusiastic like EXPAND). Max 8 candidates shown. Options: A) Add to scope, B) Defer, C) Skip.

HOLD — maximum rigor:

Complexity check (>8 files or >2 new classes)
Minimum set of changes for stated goal
No expansion proposals

REDUCE — surgical minimum:

Absolute minimum that ships value. Everything else deferred.
Separate "must ship together" vs "nice to ship together"
For each piece of scope: "Does the core value work without this?" If yes, cut it.

0G. Temporal Interrogation (EXPAND, SELECTIVE, HOLD)

Think ahead to implementation decisions that should be resolved NOW:

HOUR 1 (foundations):     What does the implementer need to know?
HOUR 2-3 (core logic):   What ambiguities will they hit?
HOUR 4-5 (integration):  What will surprise them?
HOUR 6+ (polish/tests):  What will they wish they'd planned for?

Surface these as questions for the user NOW — not "figure it out later."

Review Sections

Ten sections. Each is mandatory. Each produces concrete output. Skip only if explicitly not applicable (e.g., Section 11 for pure backend).

Section 1: Architecture Review

Evaluate:

Overall system design, component boundaries, dependency graph
Data flow (all 4 paths: happy / nil / empty / error)
State machines — diagram with ASCII art
Coupling concerns (before/after dependency graph)
Scaling at 10x and 100x
Single points of failure
Security architecture (auth boundaries, per-endpoint: who can call, what they get, what they can change)
Production failure scenarios: timeout, cascade failure, data corruption, auth failure
Rollback posture: git revert? feature flag? DB rollback? how long?

EXPAND/SELECTIVE additions: What would make this beautiful? What infrastructure makes it a platform?

Required output: ASCII system architecture diagram.

Section 2: Error & Rescue Map

For every new method/service/codepath that can fail:

METHOD/CODEPATH          | WHAT CAN GO WRONG          | EXCEPTION CLASS
EXCEPTION CLASS          | RESCUED? | RESCUE ACTION    | USER SEES

Rules:

No catch-all handling. Name every exception.
Log full context at error site.
Every rescued error must retry, degrade, or re-raise. No swallow-and-continue.
For each GAP: specify the rescue action + what the user sees.

Required output: Complete error/rescue registry table.

Section 3: Security & Threat Model

Evaluate:

Attack surface expansion (new endpoints, inputs, data stores)
Input validation: nil, empty, wrong type, too long, unicode, HTML injection
Authorization: scoped to right user/role? direct object reference?
Secrets management (env vars, rotation)
Dependency risk (new packages, known CVEs)
Data classification (PII, credentials, business-sensitive)
Injection vectors: SQL, command, template, prompt injection
Audit logging for sensitive operations

For each finding: threat, likelihood (H/M/L), impact (H/M/L), whether plan mitigates.

Section 4: Data Flow & Interaction Edge Cases

Required output: ASCII data flow diagram:

INPUT -> VALIDATION -> TRANSFORM -> PERSIST -> OUTPUT
  |          |            |           |          |
[nil?]   [invalid?]  [exception?] [conflict?] [stale?]

Interaction edge cases table:

INTERACTION              | EDGE CASE              | HANDLED? | HOW?
Double-click submit      | Duplicate action        |          |
Navigate away mid-action | Orphaned state          |          |
Slow connection          | Timeout / partial load  |          |
Stale CSRF token         | Silent failure          |          |
Retry in-flight request  | Duplicate processing    |          |
Zero results             | Empty state display     |          |
10K results              | Performance / pagination|          |
Back button              | Stale cache             |          |

Section 5: Code Quality Review

Evaluate:

Code organization and file structure
DRY violations — flag with file references
Naming quality (would a new engineer understand?)
Error handling patterns — cross-reference Section 2
Missing edge cases
Over-engineering check (unnecessary abstractions)
Under-engineering check (cut corners that will bite later)
Cyclomatic complexity: >5 branches -> flag + propose refactor

Section 6: Test Review

Produce a complete diagram of every new thing the plan introduces:

New UX flows / data flows / codepaths / background jobs / integrations / error paths

For each:

ITEM                | TEST TYPE          | IN PLAN? | HAPPY PATH | FAILURE PATH | EDGE CASE
[new codepath]      | Unit/Integration   | Y/N      | Y/N        | Y/N          | Y/N

Test ambition check:

"What test makes you confident shipping at 2am Friday?"
"What would a hostile QA engineer test?"
"What's the chaos test?" (kill a dependency mid-operation)

Flag test pyramid imbalance, flakiness risks, missing load/stress requirements.

Section 7: Performance Review

Evaluate:

N+1 queries
Memory usage (max size in production)
Database indexes needed
Caching opportunities
Background job sizing (worst-case payload, runtime, retry)
Top 3 slowest new codepaths + estimated p99 latency
Connection pool pressure

Section 8: Observability & Debuggability Review

Evaluate:

Logging: structured log lines at entry, exit, each branch point
Metrics: working signal + broken signal for each new path
Tracing: trace IDs propagated across boundaries?
Alerting: new alerts needed?
Dashboards: day-1 panels?
Debuggability: "Can you reconstruct what happened from logs alone 3 weeks post-ship?"
Runbooks: documented recovery steps?

EXPAND/SELECTIVE addition: "What observability would make this feature a joy to operate?"

Section 9: Deployment & Rollout Review

Evaluate:

Migration safety: backward-compatible? zero-downtime? table locks?
Feature flags for gradual rollout
Rollout order: migrate first, deploy second?
Rollback plan: explicit step-by-step
Deploy-time risk window
Environment parity (dev/staging/prod)
Post-deploy verification: first 5 minutes, first hour
Smoke tests

EXPAND/SELECTIVE addition: "What deploy infrastructure makes routine shipping effortless?"

Section 10: Long-Term Trajectory Review

Evaluate:

Technical debt introduced: code debt, operational debt, testing debt, doc debt
Path dependency: does this lock us into a pattern?
Knowledge concentration: only one person can maintain this?
Reversibility: 1-5 scale (1 = one-way door, 5 = easily reversible)
Ecosystem fit: consistent with rest of codebase?
"The 1-year question": read this as a new engineer in a year — is it obvious?

EXPAND/SELECTIVE additions: What comes after this ships? Platform potential?

Section 11: Design & UX Review (skip if no UI scope)

Only run if the plan involves: new UI screens, components, interaction flows, frontend changes, user-visible state changes, responsive behavior, or design system changes.

Evaluate:

Information architecture (hierarchy, scannability)
Interaction state coverage: loading / empty / error / success / partial
User journey coherence (does the flow tell a story?)
Responsive behavior
Accessibility basics (keyboard nav, screen readers, contrast)
Design system alignment

Required output: ASCII user flow showing screens/states and transitions.

EXPAND/SELECTIVE addition: "What would make this UI feel inevitable?" + 30-minute touches that make users think "oh nice."

If significant UI scope, recommend running /workflows:review with design agents after implementation.

Visual Sketch (EXPAND and SELECTIVE modes with UI scope)

After the analytical review, generate a rough concept sketch to validate the plan's UI assumptions visually. Seeing a layout exposes hierarchy and flow problems that text analysis misses.

Use skill: excalidraw-diagram to sketch the primary user flow (1-3 screens)
Focus on: information hierarchy, state transitions, key interaction points
Use realistic placeholder content — not Lorem ipsum
Keep intentionally rough — this validates the plan's UI thinking, not pixel design

Present and ask via AskUserQuestion: "Does the planned UI flow hold up visually? What's wrong or missing?"

If the sketch reveals issues: add them as findings in Section 11. Max 1 revision round — this is a validation tool, not a design session.

How To Ask Questions

One issue = one AskUserQuestion. Never combine.
Describe concretely with file/line references when possible.
2-3 options including "do nothing" where reasonable.
For each option: effort, risk, maintenance burden in one line.
Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
Recommended option listed first.
Escape hatch: No issues in a section? Say so and move on. Obvious fix? State what you'll do and move on. Only AskUserQuestion when genuine tradeoffs exist.

Required Outputs

"NOT in scope" Section

Work considered and explicitly deferred, with one-line rationale each.

"What already exists" Section

Existing code/flows that partially solve sub-problems and whether the plan reuses them.

"Dream state delta" Section

Gap between what this plan delivers and the 12-month ideal. This is the roadmap for future work.

Error & Rescue Registry

Complete table from Section 2: every method that can fail, every exception class, rescue status, rescue action, user impact.

Failure Modes Registry

CODEPATH       | FAILURE MODE        | RESCUED? | TEST? | USER SEES? | LOGGED?
[path]         | [mode]              | Y/N      | Y/N   | [what]     | Y/N

Any row with RESCUED=N, TEST=N, USER SEES=Silent -> CRITICAL GAP. Flag immediately.

Diagrams (mandatory — all that apply)

System architecture
Data flow (including shadow paths)
State machine (for every stateful object)
Error flow
Deployment sequence
Rollback flowchart

Use ASCII art inline during the review. Use skill: beautiful-mermaid for final rendered versions if needed.

Stale Diagram Audit

List every ASCII diagram in files this plan touches. Are they still accurate after the planned changes?

Review Quality Loop

After all 10 review sections are complete, the review itself needs reviewing. A fresh-context subagent catches gaps the reviewer is blind to.

Review Dimensions

Tuned for review quality, not doc quality:

Dimension	What It Checks	PASS	FAIL
Coverage completeness	Did every section produce concrete output? Any section hand-waved?	Tables/diagrams for each section	"No major issues" without evidence
Severity calibration	Are severity ratings justified? P1s actually critical? P3s not hiding real risk?	Ratings match described impact	P1 with no user impact, or P3 that could corrupt data
Registry completeness	Error/rescue and failure mode registries cover all new codepaths?	Every new method/service mapped	Gaps in coverage, missing shadow paths
Diagram accuracy	Do ASCII diagrams match the described architecture/data flow?	Diagrams consistent with text	Diagrams show components not in text, or vice versa
Actionability	Can findings be acted on? Specific enough to implement?	File refs, concrete suggestions	Vague recommendations like "improve error handling"

Process

Iteration 1: Dispatch fresh-context subagent:

Task general-purpose("You are reviewing a PLAN REVIEW document — not the plan itself, but the review of the plan. You have never seen this review before. Check: 1) Coverage completeness — did every section produce concrete output or hand-wave? 2) Severity calibration — are P1/P2/P3 ratings justified by described impact? 3) Registry completeness — do error/rescue and failure mode tables cover all new codepaths? 4) Diagram accuracy — do diagrams match the described architecture? 5) Actionability — are findings specific enough to implement? Score each dimension 1-10. Return PASS or numbered issues. Review output path: [path]")

If PASS (all dimensions 7+): Report score, proceed to outside voice.

If issues found: Fix via Edit tool. Re-dispatch. Max 3 iterations.

Convergence guard: Same issues on two consecutive iterations = stop. Add unresolved issues as "Review Quality Concerns" in the completion summary.

If subagent fails: Note "Review quality check unavailable" and proceed.

Report

"Review survived N quality rounds. M issues caught. Review quality score: X/10."

Outside Voice (Optional)

Independent challenge from a different perspective. Prefer a different model (Codex) when available — two different AI models agreeing is stronger signal than one model reviewing itself.

Offer

Via AskUserQuestion: "Want an outside voice? A different AI model can independently challenge this plan — logical gaps, feasibility risks, blind spots. Recommended for plans with >5 files or new architecture."

If declined, skip to completion summary.

Availability Check

which codex 2>/dev/null

Construct Challenge Prompt

Assemble from the review:

Plan summary (from spec.md)
Mode selected and key scope decisions
Error/rescue registry (condensed)
Failure modes registry (CRITICAL GAPS only)
Top findings by severity
Architecture diagram

Truncate at 30KB if needed — prioritize registries and critical gaps.

Challenge prompt: "You are a brutally honest technical reviewer. You have NEVER seen this plan or review before. Read it fresh and find what the review missed: logical gaps, overcomplexity, feasibility risks, missing dependencies or sequencing issues, strategic miscalibration. Be direct. Be terse. No compliments."

Execute

If Codex available:

CODEX_PROMPT_FILE=$(mktemp /tmp/ceo-review-codex-XXXXXXXX.txt)
# Write challenge prompt + context to file
codex exec "$(cat "$CODEX_PROMPT_FILE")" -s read-only -c 'model_reasoning_effort="high"'

Present full output verbatim:

OUTSIDE VOICE (Codex):
[verbatim output]

Error handling: Auth failure, timeout, empty response — all non-blocking. Fall back to Claude subagent.

If Codex NOT available (or errored):

Task general-purpose("[challenge prompt + context]")

Present under:

OUTSIDE VOICE (Claude subagent):
[verbatim output]

Cross-Review Tension

Note disagreements between review sections and outside voice:

CROSS-REVIEW TENSION:
  [Topic]: Review said X. Outside voice says Y. [Your assessment of who's right.]

For each substantive tension, propose resolution via AskUserQuestion. If the outside voice found a CRITICAL GAP the review missed, add it to the failure modes registry.

Completion Summary

+====================================================================+
|              PLAN CEO REVIEW — COMPLETION SUMMARY                  |
+====================================================================+
| Mode selected        | EXPAND / SELECTIVE / HOLD / REDUCE          |
| System Audit         | [key findings]                              |
| Step 0               | [mode + key decisions]                      |
| Section 1  (Arch)    | ___ issues found                            |
| Section 2  (Errors)  | ___ error paths mapped, ___ GAPS            |
| Section 3  (Security)| ___ issues found, ___ High severity         |
| Section 4  (Data/UX) | ___ edge cases mapped, ___ unhandled        |
| Section 5  (Quality) | ___ issues found                            |
| Section 6  (Tests)   | ___ gaps in coverage                        |
| Section 7  (Perf)    | ___ issues found                            |
| Section 8  (Observ)  | ___ gaps found                              |
| Section 9  (Deploy)  | ___ risks flagged                           |
| Section 10 (Future)  | Reversibility: _/5, debt items: ___         |
| Section 11 (Design)  | ___ issues / SKIPPED (no UI scope)          |
| Visual sketch        | produced / skipped (no UI or HOLD/REDUCE)    |
+--------------------------------------------------------------------+
| NOT in scope         | written (___ items)                          |
| What already exists  | written                                     |
| Dream state delta    | written                                     |
| Error/rescue registry| ___ methods, ___ CRITICAL GAPS              |
| Failure modes        | ___ total, ___ CRITICAL GAPS                |
| Review quality loop  | ___ rounds, score ___/10                    |
| Outside voice        | ran (codex/claude) / skipped                 |
| Cross-review tensions| ___ (listed if any)                         |
| Diagrams produced    | ___ (list types)                            |
| Stale diagrams found | ___                                         |
| Unresolved decisions | ___ (listed below)                          |
+====================================================================+

Unresolved Decisions

If any AskUserQuestion goes unanswered, note it here. Never silently default.

Output Findings

Choose output format based on review target:

Option A: Update prd.json (if plan folder)

Add review_findings to each relevant story:

{
  "severity": "P1",
  "category": "architecture",
  "agent": "plan-ceo-review",
  "finding": "No rollback plan for migration",
  "file": "docs/plans/.../spec.md",
  "suggestion": "Add explicit rollback steps in deployment section",
  "status": "logged"
}

Option B: Create file-todos (for standalone reviews)

Use skill: file-todos for structured todo management.

Deferred Item Decisions (MANDATORY)

Every item placed in "NOT in scope" or deferred during mode-specific deep dive must get its own individual AskUserQuestion. Never batch deferred items into a single decision.

For each deferred item:

[Item description]
  Why deferred: [one-line reason]
  Effort: S / M / L / XL
  Priority: P1 / P2 / P3
  Context: [what depends on this, what it unblocks]

  A) Add to todos — track for later
  B) Add to scope — build it now (changes the plan)
  C) Drop — not worth tracking

This forces explicit decisions on every piece of deferred work. "NOT in scope" is not a dumping ground — it's a conscious choice per item.

Next Steps

Use AskUserQuestion:

Question: "Plan review complete. What next?"

Options:

Run /workflows:work — Start implementing (if no CRITICAL GAPS)
Revise the plan — Update spec.md/prd.json based on findings
Run /workflows:review — Follow up with code-level review after implementation
Run /office-hours — Step back and re-examine the problem (if premise challenge raised doubts)
Done — Review complete

If CRITICAL GAPS exist: Strongly recommend option 2 before proceeding to work.

Important Rules

Never start implementation. Review only.
Questions ONE AT A TIME. Never batch.
Every review section produces concrete output (table, diagram, or explicit "no issues").
Do not skip review sections. If a section has no findings, say "Section N: No issues found" and move on.
Anti-patterns from the codebase audit should be flagged if the plan risks repeating them.
Scope decisions are the user's. Present evidence and recommend, but respect their choice.
ASCII diagrams are mandatory, not optional. If you can't diagram it, the plan doesn't understand it yet.

Related skills

More from somtougeh/dotfiles

Installs

Repository

somtougeh/dotfiles

First Seen

Mar 29, 2026

plan-ceo-review

Plan CEO Review

Where This Fits

Philosophy

Cognitive Patterns

Prime Directives

Engineering Preferences

Priority Hierarchy Under Context Pressure

Pre-Review: System Audit

Detect Review Target

Codebase Audit

Taste Calibration (EXPAND and SELECTIVE modes only)

Landscape Check

Prerequisite Skill Offer

Mid-Session Detection

Step 0: Nuclear Scope Challenge + Mode Selection

0A. Premise Challenge

0B. Existing Code Leverage

0C. Dream State Mapping

0D. Implementation Alternatives (MANDATORY)

0E. Mode Selection

0F. Mode-Specific Deep Dive

0G. Temporal Interrogation (EXPAND, SELECTIVE, HOLD)

Review Sections

Section 1: Architecture Review

Section 2: Error & Rescue Map

Section 3: Security & Threat Model

Section 4: Data Flow & Interaction Edge Cases

Section 5: Code Quality Review

Section 6: Test Review

Section 7: Performance Review

Section 8: Observability & Debuggability Review

Section 9: Deployment & Rollout Review

Section 10: Long-Term Trajectory Review

Section 11: Design & UX Review (skip if no UI scope)

Visual Sketch (EXPAND and SELECTIVE modes with UI scope)

How To Ask Questions

Required Outputs

"NOT in scope" Section

"What already exists" Section

"Dream state delta" Section

Error & Rescue Registry

Failure Modes Registry

Diagrams (mandatory — all that apply)

Stale Diagram Audit

Review Quality Loop

Review Dimensions

Process

Report

Outside Voice (Optional)

Offer

Availability Check

Construct Challenge Prompt

Execute

Cross-Review Tension

Completion Summary

Unresolved Decisions

Output Findings

Option A: Update prd.json (if plan folder)

Option B: Create file-todos (for standalone reviews)

Deferred Item Decisions (MANDATORY)

Next Steps

Important Rules

More from somtougeh/dotfiles

technical-svg-diagrams

organization-best-practices

grill-me

tanstack-query-best-practices

dogfood

better-auth-best-practices