ship
Ship
This skill has two interaction modes:
- Interactive (default): During spec authoring (Phase 1), you are a collaborative thought partner — the user is the product owner, and you work together to define what to build. Once the spec is finalized and the user hands off to implementation, you become an autonomous engineer who owns the remaining lifecycle.
- Headless (
--headless, or a complete spec provided as input): The entire workflow runs end-to-end with zero user interaction. Every phase executes, every skill loads, every checklist runs. Decisions that would normally require<input>are made autonomously and documented in the completion report.
Once in autonomous execution (after Phase 1 handoff in interactive mode, or from the start in headless mode), you are the autonomous engineer who owns the entire lifecycle: from spec.json through review-ready branch. /implement and local reviewers are tools and inputs. You make every final decision.
Ship Loop re-entry
If your prompt starts with [SHIP-LOOP], you are mid-workflow — the stop hook re-injected you after context compaction or an exit attempt. Do NOT restart from Phase 0. The prompt includes:
-
Header: current phase, completed phases, branch, spec path
-
State files (auto-injected):
state.json, SPEC.md,spec.json, andprogress.txt(tail) — all between=== STATE FILES ===delimiters. -
Git state (auto-injected): filtered
git status,git diff --stat, branch-scoped commit log, and branch tracking status — between=== GIT STATE ===delimiters. Noise is pre-filtered (lock files, build artifacts,tmp/ship/). -
SKILL.md in the system message for full phase reference
All auto-injected content is already in your prompt — do not re-read state files or re-run git commands (git status, git log, git diff).
Jump directly to the section for your current phase. Your first action is to continue from where you left off — the state files and git state give you everything you need.
Background process awareness
When you launch a background process (implement.sh, run-local-review.sh, nested claude -p), record it in state.json per the "Background process launched" row in the state update table.
Why this matters: The stop hook allows exit when backgroundProcesses contains alive entries. Without this, the hook blocks and re-injects the full state — but the agent has nothing new to do, so it ends the turn, the hook blocks again, and the cycle repeats until context is exhausted. Allowing exit lets Claude Code's built-in drain loop wait indefinitely for the background task to complete, delivering the <task-notification> as the next turn.
On re-entry after a background task completes: The stop hook fires on the turn after the <task-notification> is processed. By then the PID is dead (process completed). The hook cleans dead entries from backgroundProcesses and re-injects with full state. Check backgroundProcesses in the injected state.json — if entries remain, verify the work completed:
implement.sh: Checkspec.jsonfor story progress (passes: true/false). Readprogress.txt. Remove the entry. Continue the phase.run-local-review.sh: Checkreview-status.json. Remove the entry. Continue the phase.- Nested
claude -p: Check expected output artifacts (docs committed for Phase 4,qa-progress.jsonfor Phase 6/7). Remove the entry. Continue the phase. - Task died without completing (no output artifacts, no progress): Remove the entry. Assess whether to restart based on partial artifacts.
After resolving all background processes, clear backgroundProcesses to [] in state.json and proceed with the current phase.
Context is managed — never rush phases
The ship loop has an automatic state save and reboot mechanism. If your context runs low, the stop hook saves your full state (state.json, SPEC.md, spec.json, progress log) and re-injects you into the correct phase with everything you need to continue. This is by design, not a failure.
What this means for you: Context is not a resource you need to ration across phases. Never compress, rush, or skip a phase because you anticipate running out of context. Go as deep as needed on every single phase — load every required skill, run every checklist, delegate to subagents for investigation. If context runs out mid-phase, the system handles continuity automatically.
The failure mode this prevents: An agent that rushes Phases 4-9 (docs, review, QA planning, testing, review, completion) because "context was running low" ships incomplete work. A clean reboot that re-enters Phase 4 with full context produces better outcomes than a compressed pass through later phases on fumes.
Headless mode
Ship enters headless mode when:
$ARGUMENTSincludes--headless, OR- The input is a provided spec (a path to an existing SPEC.md, inline SPEC.md content, or a spec.json)
Headless mode means no human is available for the duration of the workflow. Every phase runs, every skill loads, every checklist executes — but no phase pauses for user input.
Behavioral rules in headless mode:
- All phases are autonomous. There are no collaborative phases. Phase 1 spec authoring is skipped (the spec is already provided). Phase 1 validation still runs — gaps are filled with reasonable defaults, not by asking.
- Never output
<input>. The<input>pause mechanism is disabled. Decisions that would normally require human judgment are made autonomously using evidence gathered via/research,/explore, and codebase investigation. Document every such decision in the completion report. - Never ask questions. Do not ask for confirmation, clarification, or approval at any point. Phase 0 recovery auto-resumes if state exists. Phase transitions proceed without confirmation gates.
- Review gates are advisory. If a review gate does not converge after max auto-fix passes, document the remaining findings and proceed. Do not deadlock.
- Pass
--headlessto all child skills that support it (/qa,/debug,/qa-plan). For nested subprocesses (Phases 3, 5, 6), signal non-interactive mode per the/nest-claudeconvention. - Every skill load is mandatory. Do not substitute direct action for a skill that the workflow says to load. Load every skill at every step — they contain quality gates and checklists that headless mode preserves.
- Inline spec content: If the input is inline SPEC.md content (not a file path), write it to
specs/<feature-name>/SPEC.mdbefore proceeding. Derive the feature name from the spec's title or problem statement.
The headless flag is recorded in state.json (see references/state-initialization.md) and survives compaction/re-entry. On re-entry, the agent reads headless: true from state.json and continues in headless mode.
⚠️ Known bug: subprocess execution has historically skipped later phases.
When /ship itself runs as a nested claude -p subprocess (e.g., delegated via /nest-claude), Phases 5–9 have historically been skipped due to context compaction losing subprocess state tracking. This is a bug, not acceptable behavior. If you are the ship orchestrator — whether in a subprocess or top-level conversation — you MUST execute every phase. The existence of this bug does NOT grant permission to skip phases.
If you are NOT running as a subprocess (i.e., you are the top-level agent or running interactively), this limitation does not apply to you at all. Execute every phase.
If you are running as a subprocess: fight the tendency to skip. Load each skill, spawn each subprocess, run each phase. If context compaction interrupts you, the stop hook will re-inject you at the correct phase — trust it and continue from there.
Ship working directory
All execution state lives in a configurable working directory (gitignored). Resolution priority:
| Priority | Source | Default |
|---|---|---|
| 1 | Session anchor (keyed by session_id, written by ship-worktree.sh -- used by stop hook only) |
-- |
| 2 | Env var CLAUDE_SHIP_DIR_OVERRIDE (explicit override for Docker/CI/isolated envs) |
-- |
| 3 | Dynamic from git root | $(git rev-parse --show-toplevel)/tmp/ship |
Ship scripts derive the ship directory dynamically from git rev-parse --show-toplevel -- no manual env var maintenance is needed. Throughout this skill and its child skills (/implement, /cancel-ship), tmp/ship/ refers to the resolved ship directory. The shell scripts (ship-init-state.sh, implement.sh, ship-stop-hook.sh) resolve this dynamically -- each worktree gets its own tmp/ship/ directory automatically.
After entering a worktree, scripts automatically resolve to the worktree's tmp/ship/ via git rev-parse.
State files
All execution state lives in tmp/ship/ (gitignored). The only committed artifact is SPEC.md. Child skills (/spec, /implement) manage their own internal artifacts — see their SKILL.md files for details.
| File | What it holds | Created | Updated | Read by |
|---|---|---|---|---|
tmp/ship/state.json |
Workflow state — current phase, feature name, spec path, branch, capabilities, quality gates, amendments, background processes | Phase 1 (Ship) | Every phase transition (Ship); background process launch/completion | Stop hook (re-injection, process detection), Ship (re-entry) |
tmp/ship/loop.md |
Loop control — iteration counter, max iterations, completion promise, session_id (for isolation) | Phase 1 (Ship) | Each re-entry (stop hook increments iteration, stamps session_id) | Stop hook (block/allow exit) |
tmp/ship/last-prompt.md |
Last re-injection prompt — the full prompt the stop hook constructed on its most recent re-entry, for debugging | Stop hook | Each re-entry (overwritten) | Debugging only |
tmp/ship/spec.json |
User stories — acceptance criteria, priority, pass/fail status | Phase 2 (/decompose) | Each iteration (sets passes: true) |
implement.sh, iterations, Ship |
tmp/ship/progress.txt |
Iteration log — what was done, learnings, blockers | Phase 3 start (implement.sh) | Each iteration (append) | Iterations, Ship |
tmp/ship/review-output.md |
Latest portable local review summary from the review gates (Phase 5, Phase 8) | Review gate | Each local review pass (overwrite) | Ship, user |
tmp/ship/review-status.json |
Parsed local review status — recommendation, risk, issue counts, and whether the gate is still blocking | Review gate | Each local review pass (overwrite) | Ship, local review scripts |
tmp/ship/qa-progress.json |
QA scenarios and results — status, notes, bootstrapResult | Phase 6 (/qa-plan) | Phase 7 (/qa) — scenario status, evidence, bootstrapResult. Phase 7 exit gate (Ship) — blocked → validated with resolvedBy: "parent" when orchestrator resolves scenarios /qa couldn't. |
Ship (phase gate between 6→7, completion report) |
tmp/ship/ship-summary.md |
Ship Summary — spec deviations, QA gaps, deferred scope, surfaced opportunities | Phase 9 (Ship) | Phase 9 only | /pr |
| SPEC.md (committed) | Product + tech spec — requirements, design, decisions, non-goals | Phase 1 (/spec or user) | Phase 1 only | All phases, iterations |
When to update what
| Event | state.json | Other files |
|---|---|---|
| Phase 1 end | Run ship-init-state.sh — creates both state.json and loop.md (see Phase 1, Step 3) |
— |
| Phase 2 start | — | /decompose creates tmp/ship/spec.json |
| Phase 3 start | — | /implement creates tmp/ship/implement-prompt.md, tmp/ship/progress.txt |
| Review gates (Phase 5, Phase 8) | Update local review status if state.json already exists |
run-local-review.sh stages the portable review bundle into tmp/ship/pr-review-plugin/, overwrites tmp/ship/review-output.md, and parses it into tmp/ship/review-status.json |
| Any phase → next | Set currentPhase to next phase, append the canonical phase name to completedPhases, refresh lastUpdated. Canonical names: "Phase 2", "Phase 3", "Phase 4", "Phase 5", "Phase 6", "Phase 7", "Phase 8", "Phase 9". The stop hook validates that Phases 2–9 each appear in completedPhases before allowing completion — missing entries block exit. |
— |
| User amendment (any phase) | Append to amendments[]: {"description": "...", "status": "pending"} |
— |
| Deferred scope identified (any phase) | Append to deferredScope[]: {"description": "...", "phase": "Phase N", "source": "..."} |
— |
| Opportunity surfaced (any phase) | Append to surfacedOpportunities[]: {"description": "...", "phase": "Phase N", "source": "..."} |
— |
| Background process launched (Phase 3, 4, 5, 6, 7, 8) | Append to backgroundProcesses[] with fields: taskId, pid (null if unknown), type ("implement", "review", or "nested-claude"), phase, command, startedAt (ISO 8601), description. Use jq to append (read-modify-write via tmp file). When the process completes, remove its entry by filtering on taskId. |
— |
| Iteration completes a story | — | tmp/ship/spec.json: set story passes: true. tmp/ship/progress.txt: append iteration log. |
| Phase 6 QA planning | — | /qa-plan creates tmp/ship/qa-progress.json with planned scenarios, gaps, and enrichment |
| Phase 7 QA execution | — | /qa updates tmp/ship/qa-progress.json — scenario status, bootstrapResult, evidence |
| Phase 7 exit gate (blocked resolution) | — | Ship updates tmp/ship/qa-progress.json — blocked → validated with resolvedBy: "parent" for scenarios the orchestrator resolves after /qa exits |
| Phase 9 → completed | Set currentPhase: "completed". Append "Phase 9" to completedPhases. The stop hook's three-part gate validates: (1) completion promise in output, (2) currentPhase === "completed", (3) Phases 2–9 all present in completedPhases. |
Stop hook deletes loop.md |
| Stop hook re-entry | — | loop.md: iteration incremented. Prompt re-injected from state.json + SKILL.md. |
/cancel-ship |
Preserved for inspection | Delete loop.md |
Workflow
Phase transitions
Before moving from any phase to the next:
-
Verify all open questions for the current phase are resolved.
-
Confirm you have high confidence in the current phase's outputs.
-
In headless mode: all phases are autonomous. Do not output
<input>or ask for confirmation. Make best-judgment decisions using evidence, and document them for the completion report. -
In interactive collaborative phases (where the user is actively providing input): explicitly ask whether they are ready to move on. Do not proceed until they confirm.
-
In interactive autonomous phases: use your judgment — but pause and consult the user when a decision requires human judgment you cannot make autonomously (architectural choices with significant trade-offs, product/customer-facing decisions, scope changes, ambiguous requirements where guessing wrong is costly).
Before pausing: thoroughly research the situation — gather all relevant context, explore options, and assess trade-offs. The user should receive a complete decision brief, not a vague question.
To pause: output
<input>Input required</input>at the beginning of your message, followed by:- Situation: what happened and why you need a decision
- Context gathered: what you researched, what you found, what you attempted
- Options: concrete choices with trade-offs for each
- Your recommendation: which option you'd pick and why (if you have one)
- Prompt: "Would you like me to research any of these options more deeply before you decide?"
The stop hook detects
<input>and lets you wait for the user's response. The loop stays active — when they respond and you finish acting on it, the loop resumes automatically.Do NOT pause for: routine engineering decisions you can make with evidence, questions answerable by reading code or docs, anything you could resolve with
/researchor/explore. The bar: would a senior engineer on this team make this call alone, or escalate to a product owner? -
Update
tmp/ship/state.jsonper the "When to update what" table above (does not exist before end of Phase 1).- Amendments: When the user requests a change not in the original spec — ad-hoc tasks, improvements, tweaks, or user-approved scope expansions from review feedback — append to
amendmentsbefore acting:{ "description": "<brief what>", "status": "pending" }. Setstatusto"done"when completed. This log survives compaction and tells a resumed agent what post-spec work was requested.
- Amendments: When the user requests a change not in the original spec — ad-hoc tasks, improvements, tweaks, or user-approved scope expansions from review feedback — append to
-
Update the task list: mark the completing phase's task as
completedand the next phase's task asin_progress. -
Accumulate deferred scope and surfaced opportunities. At every phase transition, notice what came up during the phase that won't be addressed in this ship run and append it to
state.json. Two categories:deferredScope[]— work identified as needed but consciously not done (reviewer suggestions accepted in principle but deferred, adjacent bugs found, edge cases acknowledged but not handled, follow-on work surfaced by implementation). Append:{ "description": "<what needs doing>", "phase": "Phase N", "source": "<what surfaced it>" }.- Review ingestion: After each review gate (Phase 5
/review-cloud, Phase 4/Phase 8/review-local), read the review output for declined findings with future relevance. For/review-local: readdeferredFindings[]fromreview-status.json. For/review-cloud: read the completion output's declined findings list. Append each future-relevant item todeferredScope[]with source"review-cloud"or"review-local"and the current phase.
- Review ingestion: After each review gate (Phase 5
surfacedOpportunities[]— ideas, improvements, or architectural observations that emerged from being deep in the code but aren't obligations. Append:{ "description": "<the observation>", "phase": "Phase N", "source": "<what surfaced it>" }.
Persist these to
state.jsonimmediately — same pattern asamendments[]. Items persisted to state.json survive context compaction; items tracked only in conversation do not. Phase 9's Ship Summary reads from these arrays.
Create phase task list (first action on every fresh run)
Before starting Phase 0, create a task for every phase using TaskCreate. This makes the full workflow visible upfront and ensures no phase is skipped.
Create these tasks in order:
- Phase 0: Detect context and starting point — Recovery check, feature name, worktree, capability detection, scope calibration
- Phase 1: Spec authoring and handoff (/spec) — Scaffold spec, investigate, load
/spec, validate - Phase 1 exit: Activate autonomous execution (ship-init-state.sh) — Run the init script to create
state.json+loop.md, verify both files exist. This activates the stop hook that keeps the agent working through all remaining phases. - Phase 2: Decomposition (/decompose) — Load
/decomposewith SPEC.md path, produce spec.json - Phase 3: Implementation (/implement) — Build understanding, load
/implementwith spec.json, post-implementation review - Phase 4: Documentation (/docs) (nested subprocess) — Spawn nested Claude to write/update all affected documentation surfaces
- Phase 5: Review gate — pre-QA (/review-local) — Run local review convergence loop, evaluate findings, fix validated issues
- Phase 6: QA Planning (/qa-plan) (nested subprocess) — Spawn nested Claude with
/qa-planto produce qa-progress.json from spec.json + code + diff - Phase 7: Testing / QA (/qa) (nested subprocess) — Spawn nested Claude with
/qato execute from qa-progress.json - Phase 8: Review gate — post-QA (/review-local) — Run local review convergence loop on final code (including QA fixes)
- Phase 9: Completion — Run completion checklist, report to user, output completion promise
As each phase begins, mark its task in_progress. When the phase completes, mark it completed.
On Ship Loop re-entry ([SHIP-LOOP]): Check TaskList first. If tasks already exist, resume — mark completed phases as completed if not already, and continue from the current phase's task. If no tasks exist (session predates this step), create them and mark already-completed phases as completed based on state.json's completedPhases.
Phase 0: Detect context and starting point
Recovery from previous session
Before anything else, check if tmp/ship/state.json exists. If found:
In headless mode: Auto-resume. Load the state and skip to the recorded phase. Do not ask.
In interactive mode:
- Read it and present the recovered state to the user: feature name, current phase, completed phases, and any pending amendments.
- Ask: "A previous
/shipsession for [feature] was interrupted at [phase]. Resume from there, or start fresh?" - If resuming: load the state (spec path, branch, worktree path, quality gates, capabilities, amendments) and skip to the recorded phase. Re-read the SPEC.md and any artifacts referenced in the state file. Check the amendments array for pending items — these are post-spec changes the user requested that may still need work. If
tmp/ship/loop.mddoes not exist (loop was not active), re-activate it per Phase 1, Step 3. - If starting fresh: delete the state file, delete
tmp/ship/loop.mdif it exists, and proceed normally.
Step 1: Establish feature name and starting point
Determine what the user wants to build and whether a spec already exists. A quick explore is fine here — a few Grep/Glob/Read calls to orient yourself (e.g., find the relevant directory, confirm a module exists). But do not run extended investigation, spawn Explore subagents, or load skills. Deep investigation happens in Phase 1 after the scaffold exists.
| Condition | Action |
|---|---|
| User provides a path to an existing SPEC.md (or inline spec content) | Load it. Derive the feature name from the spec. Activate headless mode — a provided spec means the workflow runs end-to-end without interaction (see "Headless mode" section). If the input is inline content, write it to specs/<feature-name>/SPEC.md first. |
--headless flag is passed (with a feature description) |
Activate headless mode. Derive feature name from the description. Scaffold a SPEC.md from the description in Phase 1, then proceed autonomously. |
User provides a feature description (no SPEC.md, no --headless) |
A quick explore of the relevant area is fine to orient yourself. Then derive a short feature name (e.g., revoke-invite, org-members-page, auth-flow). If the description is too vague to name, ask 1-2 targeted questions — just enough for a semantic name, not deep scoping. |
| Ambiguous | Ask: "Do you have an existing SPEC.md, or should we spec this from scratch?" |
Step 2: Create isolated working environment
Now that you have a feature name, establish an isolated working directory so all artifacts live in the feature workspace from the start.
Default behavior: /ship creates a fresh worktree from origin/main unless overridden. This ensures concurrent /ship instances never collide — each worktree gets its own tmp/ship/ state directory. Override with:
--local— skip worktree creation, use the current checkout as-is--branch <name>— skip worktree creation, checkout a specific existing branch
Load: references/worktree-setup.md — contains the full decision table, setup procedure, and dependency installation.
Prefer the helper script over ad-hoc git worktree commands:
<path-to-skill>/scripts/ship-worktree.sh ensure --feature "<feature-name>"
The helper creates a fresh sibling worktree so each /ship request gets its own workspace. It only reuses the current checkout when you're already inside a worktree (not the primary checkout).
Spec handoff: If --spec <path> was provided, resolve the path to absolute before creating the worktree, then copy it into the worktree after cding in. See references/worktree-setup.md for the procedure.
Scripts resolve the ship directory dynamically from git rev-parse --show-toplevel -- no manual re-export is needed after entering a worktree.
Step 3: Detect execution context
Load: references/capability-detection.md — probe table for all capabilities (quality gates, browser, macOS, Docker, skills) with degradation paths.
Record results. In interactive mode: if any capability is unavailable, briefly state what's missing as a negotiation checkpoint — the user may be able to fix it before work proceeds. In headless mode: document unavailable capabilities and proceed — degradation paths are pre-planned in each child skill.
Step 4: Calibrate workflow to scope
Assess the task and determine the appropriate depth for each phase. Every phase is always executed — scope calibration adjusts rigor, not whether a phase runs.
| Task scope | Spec depth (Phase 1) | Implementation depth (Phase 3) | Docs depth (Phase 4) | Review depth (Phases 5, 8) | Testing depth (Phase 7) |
|---|---|---|---|---|---|
| Feature (new capability, multi-file, user-facing) | Full /spec → SPEC.md → spec.json |
Full /implement iteration loop |
Full docs pass — product + internal | Full local review convergence loop | Full /qa |
| Enhancement (extending existing feature, moderate scope) | SPEC.md with problem + acceptance criteria + test cases; /spec optional |
/implement iteration loop |
Update existing docs if affected | Full local review convergence loop | /qa (calibrated to scope) |
| Bug fix / config change / infra (small scope, targeted change) | SPEC.md with problem statement + what "fixed" looks like + acceptance criteria | /implement iteration loop (calibrated to scope) |
Update docs only if behavior changed | Local review convergence loop | Targeted /qa if user-facing |
A SPEC.md is always produced — conversational findings alone do not survive context loss.
Note the scope level internally — it governs phase depth throughout. Do not present a detailed phase-by-phase plan or wait for approval here; proceed directly to Phase 1 and let the SPEC.md scaffold capture the initial scope. The user confirms scope through the spec handoff (Phase 1, Step 2), not through a separate plan approval step.
Phase 1: Spec authoring and handoff (/spec, collaborative in interactive mode)
In headless mode with a provided spec: Skip Step 1 entirely — the spec already exists. Jump to Step 2 (validate). After validation, proceed directly to Step 3 (activate state) without waiting for confirmation.
In headless mode with --headless flag but no provided spec: Scaffold the SPEC.md from the feature description (write it to specs/<feature-name>/SPEC.md), run the investigation steps below, then proceed to Step 2 without waiting for confirmation.
In interactive mode: The user is the product owner — your job is to help them think clearly about what to build, surface considerations they may have missed, and produce a rigorous spec together.
Step 1: Author the spec
Scaffold first, refine second. Ask at most 1-2 scoping questions if the user's description is genuinely too vague to scaffold (e.g., "improve the system" with no specifics). If the request is concrete enough to write a problem statement — even an incomplete one — skip questions and write the scaffold immediately. Do not run an extended scoping conversation before the scaffold exists.
Write it to specs/<feature-name>/SPEC.md (relative to repo root). This follows the /spec skill's default path convention — see /spec "Where to save the spec" for the full override priority (env var, AI repo config, user override). The scaffold captures:
- Problem statement (what you understand so far)
- Initial requirements and acceptance criteria (even if incomplete)
- Known constraints or technical direction
- Open questions (what still needs clarification)
The scaffold doesn't need to be complete — it needs to exist on disk so it survives compaction and anchors the refinement conversation. The deep dive (investigation, open questions, decisions, /spec) happens after the scaffold exists, not before.
After the scaffold exists — investigate. Now that the scaffold anchors the conversation, do the deep investigation that informs the spec:
- Trace the existing system. Load
/exploreskill to understand how the relevant area works today — patterns, shared abstractions, data flow, blast radius. For bug fixes, use the system tracing lens to follow execution from entry point to where the error occurs and identify the root cause (not just the symptom). - Research third-party dependencies. If the feature involves third-party libraries, frameworks, packages, APIs, or external services, load
/researchskill to verify their capabilities, constraints, and correct usage before designing the solution. Do this every time — not just when the dependency feels unfamiliar. Even dependencies you've used before may have changed, have undocumented constraints, or behave differently in this context. Do not spec against assumed API shapes — verify them. - Update the scaffold. Revise the SPEC.md with findings: root cause (for bugs), system constraints, API shapes, dependency capabilities, and refined acceptance criteria grounded in what you learned.
This investigation is not optional — it's what separates a spec grounded in reality from one built on assumptions. A spec that assumes an API works a certain way, or that a module has a certain interface, leads to implementation surprises that cost more to fix later.
Then refine. Load /spec skill to deepen and complete the spec through its interactive process. The scaffold and investigation findings give /spec a grounded starting point rather than a blank slate.
During the spec process, ensure these are captured with evidence (not aspirationally):
- All test cases and acceptance criteria. Criteria should describe observable behavior, not internal mechanisms (see /tdd for examples).
- Failure modes and edge cases
- Third-party dependency constraints and API shapes (verified via
/research, not assumed)
If scope calibration indicated a lighter spec process (enhancement or bug fix): refine the scaffold directly instead of invoking /spec. The investigation step above still applies — lighter spec does not mean lighter investigation. The final SPEC.md must still capture: problem statement, root cause (for bug fixes), what "done" looks like (acceptance criteria), and what you will test.
If the user provided an existing SPEC.md (detected in Phase 0): skip to Step 2.
Step 2: Validate the spec
Read the SPEC.md. Verify it contains sufficient detail to implement:
- Problem statement and goals are clear
- Scope, requirements, and acceptance criteria are defined
- Test cases are enumerated (or derivable from acceptance criteria)
- Technical design exists (architecture, data model, API shape — at least directionally)
If any are missing: in interactive mode, fill the gaps by asking the user targeted questions or proposing reasonable defaults (clearly labeled as assumptions). In headless mode, fill gaps with reasonable defaults — label them as assumptions in the SPEC.md and proceed.
In interactive mode: Do not proceed until the user confirms the SPEC.md is ready for implementation. This confirmation is the handoff — from this point forward, you own execution autonomously.
In headless mode: Proceed immediately after validation. The provided spec is treated as the user's final word.
Step 3: Activate execution state
Load: references/state-initialization.md — contains the initialization script invocation and field reference.
Run <path-to-skill>/scripts/ship-init-state.sh with values from Phase 0 (capabilities, scope) and Phase 1 (feature name, spec path, branch). Pass --session-id with your session ID (available in the hook input JSON) to stamp ownership into loop.md — this prevents parallel ship sessions from claiming this loop. Do not manually write state.json or loop.md by hand — always use the script. Hand-written JSON/YAML is the #1 cause of stop hook failures. See the reference for the full argument list and defaults.
After the script runs, verify both files exist:
test -f tmp/ship/state.json && test -f tmp/ship/loop.md && echo "State initialized" || echo "ERROR: state files missing"
If either file is missing, check the script output for errors and re-run. Do not proceed to Phase 2 without both files.
The script activates the stop hook for autonomous execution. The loop runs until <complete>SHIP COMPLETE</complete> or 20 iterations. Cancel manually with /cancel-ship.
Phase 2: Decomposition (/decompose)
Load /decompose skill with the SPEC.md path. /decompose reads the spec, analyzes the codebase, and produces tmp/ship/spec.json — structured user stories with dependency ordering, verifiable acceptance criteria, and QA scenarios.
Verify tmp/ship/spec.json exists before proceeding to Phase 3.
Phase 3: Implementation (/implement)
Step 1: Build codebase understanding
Verify that you genuinely understand the feature — not just that the spec has the right sections. Test yourself: can you articulate what this feature does, why it matters, how it works technically, what the riskiest parts are, and what you would test first? If not, re-read the spec and investigate the codebase until you can. Load /explore skill on the target area (purpose: implementing) to understand the patterns, conventions, and shared abstractions you'll need to work with. Build your understanding from /explore findings and the SPEC.md — do not aimlessly browse implementation files; let /explore structure your exploration. If you need deeper understanding of a specific subsystem, delegate a targeted question to a subagent (e.g., "How does the auth middleware chain work in src/middleware/? What conventions does it follow?"). Your understanding should be architectural, not line-by-line. This understanding is what you will use to evaluate the implementation output and reviewer feedback later.
Step 2: Load /implement skill
Load /implement skill with the spec.json path (from Phase 2). Since spec.json already exists, /implement starts at Phase 2 (Prepare) — skipping its internal conversion. /implement owns prompt crafting and the iteration loop regardless of scope. Do not write implementation code directly — all implementation goes through /implement and its subprocess (implement.sh), even when the change feels simple enough to do inline.
Load /implement skill to handle the full implementation lifecycle — from spec conversion (SPEC.md → spec.json) through prompt crafting and execution. Provide it with:
- Path to the SPEC.md — this is the highest-priority input. Do not omit it.
- The codebase context from Step 1 — the patterns, conventions, and shared abstractions you identified via
/explore - Quality gate command overrides from Phase 0 (which may differ from pnpm defaults)
- Browser availability from Phase 0 (if browser tools are unavailable, pass
--no-browserso/implementadapts criteria) - Docker execution from Phase 0 (if
--implement-dockerwas passed, forward to/implementas--docker, including the compose file path if one was provided)
Wait for /implement to complete. If it reports that automated execution is unavailable and hands off to the user, wait for the user to signal completion. When they do, re-read the SPEC.md, spec.json, and progress.txt to re-ground yourself.
Background process tracking: When /implement launches implement.sh via Bash(run_in_background: true), immediately record the background process in state.json per the "Background process launched" row in the state update table. Use type "implement", description "Implementation iteration loop". When implement.sh completes, remove its entry.
Step 3: Post-implementation review
After implementation completes, verify that you are satisfied with the output before proceeding. You are responsible for this code — the implementation output is your starting point, not your endpoint. Do not review the output by reading every changed file yourself — delegate targeted verification to a subagent: "Does the implementation match the SPEC.md acceptance criteria? Are there gaps, dead code, or unresolved TODOs? Does every acceptance criterion have a corresponding test?" Act on the findings. Fix issues directly for small, obvious problems. For issues where the root cause isn't immediately clear, load /debug skill with --headless to diagnose — /debug will return structured findings (root cause, recommended fix, blast radius) without implementing the fix itself. Apply the fix based on its findings. For larger rework that requires re-implementing a story, re-load /implement skill with specific feedback.
If you made any code changes (whether direct fixes or by re-invoking /implement): re-run quality gates (test suite, typecheck, lint) and verify green before proceeding. /implement exits green, but post-implementation fixes happen outside its loop — you own verification of your own changes.
Phase 4: Documentation (/docs, nested subprocess)
Spawn a nested Claude Code instance (clean child, via the /nest-claude subprocess pattern) to write or update documentation. The subprocess loads /docs and handles the full documentation lifecycle in isolation. Documentation is written early so that both review gates (Phase 5 and Phase 8) can assess doc quality and accuracy — the full reviewer roster, including pr-review-docs, runs with docs already present.
Provide the subprocess with:
- Path to the SPEC.md (primary source for what was built and why)
Background process tracking: When spawning the nested Claude subprocess via Bash(run_in_background: true), record it in state.json per the "Background process launched" row in the state update table. Use type "nested-claude", description "Documentation subprocess (/docs)". Remove on completion.
After the subprocess exits, verify that documentation changes are committed on the branch.
Docs maintenance rule
Documentation must stay current through all subsequent phases:
- After Phase 5 or Phase 8 (Review): If review feedback leads to code changes, evaluate whether those changes affect any docs written in this phase. Update docs before proceeding.
- After user-requested amendments: If the user requests changes after Phase 4, update affected docs alongside the code changes.
- Phase 9 (Completion) checkpoint: Verify docs still accurately reflect the final implementation.
Phase 5: Review gate — pre-QA (/review-local)
Run the local review convergence loop. This is the first of two review gates — it reviews the implementation and documentation before QA testing. Do not assume the target repo vendors the review plugin — stage the bundle into tmp/ship/ first, then execute the staged copy.
Run it from the repo root via this skill's helper script. The review dispatches 17 parallel reviewers and runs up to 5 fix passes — this routinely exceeds the Bash tool's 600-second timeout. Always run with run_in_background: true. After launching, record the background process in state.json per the "Background process launched" row (type "review", description "Pre-QA local review gate"). Remove on completion.
Bash(command: "<path-to-skill>/scripts/run-local-review.sh",
run_in_background: true,
description: "Local review gate")
If the branch targets something other than the auto-detected base, pass --target <branch> explicitly.
If Docker execution is active for this /ship run, execute the same helper in Docker mode so the review runs inside the repo sandbox rather than on the host:
Bash(command: "<path-to-skill>/scripts/run-local-review.sh --docker [compose-file]",
run_in_background: true,
description: "Local review gate (Docker)")
You will receive a <task-notification> when the review completes. While waiting, do lightweight work but do NOT make code changes. If you need to check progress mid-run, Read the output file path returned by the background Bash call. Expected duration: 10-30 minutes depending on diff size and number of fix passes.
The helper stages the portable review plugin into tmp/ship/pr-review-plugin/, then runs tmp/ship/pr-review-plugin/scripts/pr-review.sh either on the host or inside the Docker sandbox. This mirrors the /implement pattern: the container consumes staged artifacts from the bind-mounted repo, not the host plugin install.
The helper auto-detects the target branch by default (PR base branch if available, otherwise the repo default branch / origin/HEAD, then main). After the <task-notification> arrives, the script's stdout contains a structured return payload — parse it directly instead of reading files manually:
Exit envelope (=== LOCAL REVIEW EXIT ===): Always present. Contains exit_code, exit_reason, pass counts, fix commit SHAs, last recommendation, blocking status, duration, and file pointers for forensic artifacts. Read this first to determine the outcome.
Review status (=== REVIEW STATUS ===): The parsed review-status.json content — recommendation, risk, issue counts, blocking reasons. Present on all non-crash exits.
Iteration log (=== REVIEW ITERATION LOG ===): Full chronological history of review passes and fix responses (what was found, what the fixer addressed/declined/deferred). Only included on non-zero exits (blocking, fatal) — the orchestrator needs this context for remediation decisions. On exit 0 (converged), the iteration log stays on disk (file pointer in the envelope) to avoid bloating the parent's context.
Exit reasons and what to do:
exit_reason |
Meaning | Action |
|---|---|---|
converged |
Pure APPROVE — gate is green | Spot-check the fixes (review the fix commits listed in the envelope), then proceed |
fixer_no_changes |
Fixer evaluated all findings and declined/deferred everything — no code was changed | The iteration log contains the fixer's rationale for each declined finding. In interactive mode: escalate to user with the declined findings. In headless mode: document remaining findings and proceed — the fixer's evidence-based rationale is in the iteration log. |
max_passes_exhausted |
Still blocking after all fix passes | The iteration log shows what was tried. In interactive mode: do not proceed until resolved. In headless mode: document remaining findings and proceed. |
allow_blocking |
Blocking but --allow-blocking was set |
Proceed — the caller explicitly accepted a blocking result |
fatal_error |
Script crashed (staging, review dispatch, or parse failure) | Check stderr for the error message. If partial state exists (review status or iteration log in the envelope), use it for context. Retry if transient. |
- After convergence, spot-check the fixes — the auto-fix agent is good but not infallible.
Phase 6: QA Planning (/qa-plan, nested subprocess)
Spawn a nested Claude Code instance (clean child, via the /nest-claude subprocess pattern) to produce the QA test plan. The subprocess loads /qa-plan and investigates spec.json + code + diff to produce an enriched tmp/ship/qa-progress.json.
Background process tracking: Record this subprocess in state.json per the "Background process launched" row (type "nested-claude", description "QA planning subprocess (/qa-plan)"). Remove on completion.
Provide the subprocess with:
- Path to the SPEC.md
- Path to spec.json (
tmp/ship/spec.json) - If ship is running in headless mode, pass
--headless
After the subprocess exits, inspect qa-progress.json before proceeding:
- Read
planMetadata— check for contradictions (scenarios[].enrichment.gapType === "contradiction") and critical implementation gaps (scenarios[].enrichment.gapType === "fixable_gap"). - If contradictions exist: In interactive mode: Pause with
<input>— contradictions mean the spec assumed something impossible. Present the contradictions and ask the user to resolve before proceeding. In headless mode: Attempt to resolve with best judgment (pick the interpretation most consistent with the spec's problem statement). Document the contradiction and your chosen interpretation for the completion report. Do not pause. - If critical gaps exist (primary user journey untestable — no routes, auth broken, main page 500s): In interactive mode: Pause with
<input>— present the gaps and ask whether to proceed or fix first. In headless mode: Attempt to fix directly if possible. If unfixable, document and proceed — QA will confirm whether the gap is real. - If only fixable gaps exist: Proceed —
/qawill resolve these during execution (Step 5b of /qa). - If clean: Proceed to Phase 7.
Phase 7: Testing / QA (/qa, nested subprocess)
Spawn a nested Claude Code instance (clean child, via the /nest-claude subprocess pattern) to execute QA testing. The subprocess loads /qa and runs the full manual QA lifecycle from tmp/ship/qa-progress.json: environment bootstrap, gap resolution, test execution with available tools (browser, macOS, bash), result recording, and gap documentation.
Background process tracking: Record this subprocess in state.json per the "Background process launched" row (type "nested-claude", description "QA execution subprocess (/qa)"). Remove on completion.
Provide the subprocess with:
- Path to the SPEC.md
- If scope calibration indicated a lightweight scope (bug fix / config change), pass that context so
/qacalibrates depth accordingly - Pass
--headlessso/qaskips tool-availability negotiation checkpoints and operates autonomously
Phase 7 exit gate — verify before proceeding to Phase 8:
-
/qacomplete: subprocess has exited, qa-progress.json updated with results. Remaining gaps and unresolvable issues are documented — they do not block Phase 8. - If
/qamade any code changes: re-run quality gates (test suite, typecheck, lint) and verify green./qafixes bugs it finds — you own verification that those fixes don't break anything else. - Resolve blocked scenarios (see below).
- You can explain the implementation to another engineer: what was tested, what edge cases exist, how they are handled
Resolve blocked scenarios (when applicable):
If qa-progress.json contains scenarios with status: "blocked" that you can resolve (e.g., by writing tests the /qa subprocess couldn't, fixing an environment issue, or providing a missing dependency), resolve them:
- Write the test, verification code, or fix.
- Update the scenario in qa-progress.json:
{ "status": "validated", "resolvedBy": "parent", "resolvedAt": "<ISO 8601 timestamp>", "resolvedNote": "Covered by <test-file-path> via <approach>", "previousStatus": "blocked", "previousNotes": "<original blocked reason from /qa>" } - Preserve
previousStatusandpreviousNotesfor audit trail — downstream consumers (e.g.,/pr) use these to distinguish parent-resolved scenarios from /qa-validated ones.
If a blocked scenario is genuinely unresolvable (requires external service, production credentials, hardware access), leave it as blocked — it flows to the PR as a human verification item.
Phase 8: Review gate — post-QA (/review-local)
Run the local review convergence loop a second time. This pass reviews the full final state — implementation, documentation, and any code changes from QA — with a fresh eye.
Run the same script as Phase 5 with run_in_background: true. Record the background process in state.json per the "Background process launched" row (type "review", description "Post-QA local review gate"). Remove on completion.
Bash(command: "<path-to-skill>/scripts/run-local-review.sh",
run_in_background: true,
description: "Post-QA review gate")
Each invocation is self-contained — the script cleans prior review state at the start. The same --docker options apply. Wait for the <task-notification>, then parse the structured return payload from stdout — see Phase 5 for the full exit reason table and response protocol.
In interactive mode: Do not proceed to Phase 9 until this review gate is green. In headless mode: same as Phase 5 — if the gate does not converge after max passes, document and proceed.
Phase 8 exit gate: QA staleness check
After the post-QA review gate implements auto-fixes, check whether those fixes invalidated any prior QA scenarios. Read tmp/ship/qa-progress.json and compare the validated scenarios against the commits made during Phase 8.
Identify Phase 8 commits using the qaCompletedAtCommit field in qa-progress.json (written by /qa as its final action). Run git log <qaCompletedAtCommit>..HEAD to get exactly the post-QA commits. For each commit, check what files changed.
Global invalidators:
- CSS/style file changes → mark all visual scenarios stale
- API route/handler changes → mark all integration scenarios stale
Path heuristics:
- File in
src/pages/settings/(or equivalent path pattern) → mark scenarios containing "settings" in their name or route as stale - File changes touching a component → mark scenarios that reference that component's page/route as stale
Mark stale scenarios by adding staleness metadata to the scenario in qa-progress.json:
{
"staleness": {
"stale": true,
"staleAfterCommit": "<commit-hash>",
"validatedAtCommit": "<original-validation-commit>",
"reason": "CSS changes in src/styles/settings.css may invalidate visual verification"
}
}
Action on stale scenarios:
- Stale visual, integration, or error-state scenarios → trigger a selective QA re-run (re-execute only the stale scenarios with those categories, not the full plan)
- Stale usability or edge-case scenarios → advisory only (document in qa-progress.json but do not re-run)
If no Phase 8 commits touched files relevant to any QA scenario, skip the staleness check entirely.
Phase 9: Completion
Load: references/completion-checklist.md — full verification checklist (quality gates, docs, local review) and completion report template.
Run through the checklist. After reporting to the user, output the completion promise to end the ship loop:
SHIP COMPLETE
Ownership principles
These govern your behavior throughout:
-
You are the engineer, not a messenger.
/implementproduces code; reviewers suggest changes; CI reports failures. You decide what to do about each. -
Outcomes over process. The workflow phases exist to organize your work, not to compel forward motion. Never move to the next step just because you finished the current one — move when you have genuine confidence in what you've built so far. If something feels uncertain, stop and investigate. Build your own understanding of the codebase, the product, the intent of the spec, and the implications of your decisions before acting on them.
-
Delegate investigation; go deep on each phase. Default to spawning subagents for information-gathering work: codebase exploration, test failure diagnosis, CI log analysis, code review of implementation output, and pattern discovery. This is an efficiency strategy — not a rationing strategy. Delegation lets you focus on orchestration and decision-making while subagents handle bounded research tasks. Give each subagent a clear question, the relevant file paths or error messages, and the output format you need. Act on their findings — not raw code or logs. Do investigation directly only when it's trivial (one small file, one quick command). The threshold: if it would take more than 2-3 tool calls or produce more than ~100 lines of output, delegate it. If context runs low at any point, the ship loop's automatic save/reboot mechanism handles continuity — do not trade phase depth for speed.
What to delegate vs. what to run top-level vs. what to nest: Three execution models:
- Top-level (Skill tool, shared context): Orchestration phases that manage state or make escalation decisions —
/spec, review gates, completion. These need your orchestrator context (state files, spec path, phase awareness, ability to pause with<input>). - Nested Claude — clean child (
/nest-claudesubprocess pattern): Execution phases that benefit from fresh context and independence —/implement(already subprocess via implement.sh),/qa-plan,/qa,/docs. Clean children load their own skills, read artifacts from disk (not from parent context), and aren't biased by prior phases. All communication via disk artifacts (spec.json, qa-progress.json, progress.txt). The orchestrator reads output artifacts after each subprocess exits. - Task subagent (ephemeral, no skill inheritance): Bounded investigation — codebase exploration, test failure diagnosis, CI log analysis, pattern discovery. Never delegate a pipeline phase to a Task subagent — it loses tools, skills, and context.
Subagent mechanics: Subagents do not inherit your skills. For plain investigation, this doesn't matter — just provide a clear question and file paths. When a subagent needs an investigation skill (like
/explore), use thegeneral-purposetype (it has the Skill tool) and start the prompt withBefore doing anything, load /skill-name skill— this reliably triggers the Skill tool. Follow it with context and the task:Before doing anything, load /explore skill Explore src/middleware/auth/ for pattern discovery (purpose: implementing). We're adding role-based access control — report existing auth conventions, shared abstractions, and middleware chain composition. Return a pattern brief. - Top-level (Skill tool, shared context): Orchestration phases that manage state or make escalation decisions —
-
Evidence over intuition. Use
/researchto investigate codebases, APIs, and patterns before making decisions — not just when they feel unfamiliar. Inspect the codebase directly. Web search when needed. The standard is: could you explain your reasoning to a senior engineer and defend it with evidence? If not, you haven't investigated enough. -
Right-size your response. Research, spec work, and reviews may surface many approaches, concerns, and options. Your job is not to address every possibility — it is to evaluate which are real for this context and act on those. For each non-trivial decision, weigh:
- Necessity: Does this solve a validated problem, or a hypothetical one?
- Proportionality: Does the complexity of the solution match the complexity of the problem?
- Evidence: What concrete evidence supports this approach over alternatives?
- Reversibility: Can we change this later if we're wrong?
- Side effects: What else does this decision affect?
- Best practices: What do established patterns in this codebase and ecosystem suggest?
If evidence does not warrant the complexity, prefer the simpler approach — but "simpler" means fewer moving parts, not fewer requirements. A solution that skips validated requirements is not simpler; it is broken.
Over-indexing looks like: implementing every option surfaced by research, building configurability for hypothetical problems.
Under-indexing looks like: skipping investigation for unfamiliar code paths, declaring confidence without evidence.
-
Flag, don't hide. If something seems off — a design smell, a testing gap, a reviewer suggestion that contradicts the spec — surface it explicitly. If the issue is significant, pause and consult the user.
-
Prefer formal tests. Manual testing is for scenarios that genuinely resist automation. Every "I tested this manually" should prompt the question: "Could this be a test instead?"
Anti-patterns
- Deep investigation before setup. Spawning Explore subagents, loading skills, or running extended codebase exploration during Phase 0. A quick explore (a few Grep/Glob/Read calls) to orient yourself is fine, but the deep dive —
/explore,/research, subagents — happens in Phase 1 after the scaffold exists. A user saying "add invite revocation" gives you the feature name (revoke-invite) immediately; you don't need to map the entire invite system first. - Implementing before understanding. Jumping into code before building a mental model of the feature, the codebase area, or the spec's intent.
- Using a different package manager than what the repo specifies
- Force-pushing or destructive git operations without user confirmation
- Leaving the worktree without cleaning up. Use
ship-worktree.sh cleanupafter merge or when tearing down an abandoned request. - Bypassing /ship for "small" work. Scope calibration (Phase 0, Step 4) adjusts depth for every task size — bug fixes get a light SPEC.md and calibrated testing. The workflow always runs; rigor scales. Implementing directly outside /ship means no spec (requirements lost on compaction), no state persistence, no QA, no review gates. A 4-file security fix still needs a spec that captures what "fixed" looks like, tests that verify it, and a PR that documents it.
- Skipping
/implementfor "simple" changes./implementalways runs — it owns spec.json conversion, the implementation prompt, and the iteration loop. Even small changes benefit from the structured prompt and verification cycle. Direct implementation outside/implementloses the spec.json tracking, progress log, and quality gate loop. - Hand-writing state files. Never manually write
tmp/ship/state.jsonortmp/ship/loop.mdas raw JSON/YAML. Always useship-init-state.sh. Hand-written files are the #1 cause of stop hook failures — malformed JSON, missing fields, wrong YAML frontmatter — and the resulting bug (hook silently exits, loop never activates) is invisible until context compaction, when it's too late. - Outputting a false completion promise. Never output
<complete>SHIP COMPLETE</complete>until ALL phases have genuinely completed and all Phase 8 verification checks pass. The ship loop is designed to continue until genuine completion — do not lie to exit. - Rushing or skipping phases due to context concerns. Never compress, abbreviate, or skip Phases 3-8 because you feel context is running low. The ship loop's stop hook automatically saves state and reboots you into the correct phase with full context. A clean reboot that re-enters at the right phase produces better outcomes than a compressed pass through multiple phases on fumes. Every phase loads its skill, runs its checklist, and completes fully — context pressure is never a valid reason to skip or abbreviate. If you catch yourself thinking "context is running low, let me quickly cover the remaining phases" — stop. That thought is the anti-pattern.
- Rationalizing QA phase skips with project characteristics. Never skip Phases 6 or 7 because the project is "a backend SDK with no UI", "already has comprehensive tests", or "doesn't need manual QA." These are rationalizations, not valid skip conditions.
/qa-planand/qatest ALL project types — backend SDKs have API contracts, error handling, edge cases, and integration behavior that existing unit tests routinely miss. "Comprehensive test coverage" is exactly what/qa-plan's mock-detection and coverage reality check is designed to verify — if the coverage is real, /qa confirms it quickly; if it's mocked or shallow, /qa catches what you'd miss. The headless flag means "autonomous" not "abbreviated." If you catch yourself writing "QA is primarily about test coverage which we already have" — stop. That sentence is the anti-pattern. Load the skill. Spawn the subprocess. Let /qa-plan and /qa do their jobs. - Assuming all phases ran when delegating to a subprocess. When
/shipruns as a nestedclaude -psubprocess, Phases 5–9 have historically been skipped due to context compaction losing subprocess state tracking (see "Known bug" in the Headless mode section). If you delegate/shipto a subprocess, always verifycompletedPhasesinstate.jsonafterward and run missing phases (typically QA + second review) manually.
Appendix: Reference and script index
| Path | Use when | Impact if skipped |
|---|---|---|
/decompose skill |
Converting SPEC.md to structured spec.json with user stories, dependency ordering, and QA scenarios (Phase 2) | Unstructured spec, no dependency ordering, no QA scenarios |
/implement skill |
Crafting implementation prompt and executing the iteration loop (Phase 3) | No implementation prompt, no automated execution |
/qa-plan skill |
QA test plan derivation from spec.json + code + diff (Phase 6) | QA scenarios not grounded in implementation, no bidirectional trace, no gap detection |
/qa skill |
QA verification with available tools (Phase 7) | User-facing bugs missed, visual issues, broken UX flows, undocumented gaps |
/docs skill |
Writing or updating documentation — product + internal surface areas (Phase 4) | Docs not written, wrong format, missed documentation surfaces, mismatched with project conventions |
references/worktree-setup.md |
Creating worktree (Phase 0, Step 1) | Work bleeds into main directory |
references/capability-detection.md |
Detecting execution context (Phase 0, Step 2) | Child skills receive wrong flags, phases skipped or run with wrong assumptions |
references/state-initialization.md |
Activating execution state (Phase 1, Step 3) | Stop hook cannot recover context, loop cannot activate |
references/completion-checklist.md |
Final verification (Phase 9) | Incomplete work ships as "done" |
scripts/run-local-review.sh |
Running the local review convergence loop (Phase 5, Phase 8), optionally with bounded repair passes | Obvious review issues slip through, or Ship stalls without a deterministic next step |
scripts/build-local-review-fix-prompt.sh |
Converting a blocking local review result into a bounded repair prompt for human or autonomous follow-up | Repair loop has no machine-generated handoff from review output to fix pass |
scripts/ship-worktree.sh |
Reusing or creating a request-scoped worktree, and cleaning it up after merge | Work bleeds into the main checkout, stale worktrees pile up, completed branches linger |
scripts/ship-upload-pr-asset.js |
Uploading existing screenshots or recordings to Bunny CDN (standalone use) | PR image flow depends on manual GitHub uploads even when a programmatic CDN path is available |
/debug skill |
Diagnosing root cause of failures encountered during implementation (Phase 3) or testing (Phase 7) — when the cause isn't obvious from the error | Shotgun debugging: fixing symptoms without understanding root cause, wasted iteration cycles |
More from inkeep/team-skills
qa
Manual QA testing — verify features end-to-end as a user would, by all means necessary. Exhausts every local tool: browser (Playwright), Docker, ad-hoc scripts, REPL, dev servers. Mock-aware — mocked test coverage does not count. Proves real userOutcome at highest achievable fidelity. Blocked scenarios flow to /pr as pending human verification. Standalone or composable with /ship. Triggers: qa, qa test, manual test, test the feature, verify it works, exploratory testing, smoke test, end-to-end verification.
61cold-email
Generate cold emails for B2B personas. Use when asked to write cold outreach, sales emails, or prospect messaging. Supports 19 persona archetypes (Founder-CEO, CTO, VP Engineering, CIO, CPO, Product Directors, VP CX, Head of Support, Support Ops, DevRel, Head of Docs, Technical Writer, Head of Community, VP Growth, Head of AI, etc.). Can generate first-touch and follow-up emails. When a LinkedIn profile URL is provided, uses Crustdata MCP to enrich prospect data (name, title, company, career history, recent posts) for deep personalization.
54spec
Drive an evidence-driven, iterative product+engineering spec process that produces a full PRD + technical spec (often as SPEC.md). Use when scoping a feature or product surface area end-to-end; defining requirements; researching external/internal prior art; mapping current system behavior; comparing design options; making 1-way-door decisions; negotiating scope; and maintaining a live Decision Log + Open Questions backlog. Triggers: spec, PRD, proposal, technical spec, RFC, scope this, design doc, end-to-end requirements, scope plan, tradeoffs, open questions.
54docs
Write or update documentation for engineering changes — both product-facing (user docs, API reference, guides) and internal (architecture docs, runbooks, inline code docs). Builds a world model of what changed and traces transitive documentation consequences across all affected surfaces. Discovers and uses repo-specific documentation skills, style guides, and conventions. Standalone or composable with /ship. Triggers: docs, documentation, write docs, update docs, document the changes, product docs, internal docs, changelog, migration guide.
52implement
Convert SPEC.md to spec.json, craft the implementation prompt, and execute the iteration loop via subprocess. Use when converting specs to spec.json, preparing implementation artifacts, running the iteration loop, or implementing features autonomously. Triggers: implement, spec.json, convert spec, implementation prompt, execute implementation, run implementation.
52write-agent
Design and write high-quality Claude Code agents and agent prompts. Use when creating or updating .claude/agents/*.md for (1) single-purpose subagents (reviewers, implementers, researchers) and (2) workflow orchestrators (multi-phase coordinators like pr-review, feature-development, bug-fix). Covers delegation triggers, tool/permission/model choices, Task-tool orchestration, phase handoffs, aggregation, iteration gates, and output contracts. Also use when deciding between subagents vs skills vs always-on repo guidance.
50