implement
Implement
Convert a SPEC.md into implementation-ready artifacts and execute the iteration loop. This skill operates in three phases:
- Convert — Transform SPEC.md into a structured spec.json
- Prepare — Validate the spec.json, craft the implementation prompt, save it to a file
- Execute — Run
scripts/implement.shto iterate through user stories via subprocess
Each phase can be entered independently. If you already have a spec.json, start at Phase 2. If artifacts are ready and you need execution, start at Phase 3. If you only need conversion, stop after Phase 1.
When composed by /ship, Ship invokes /implement for the full lifecycle and reviews the output afterward. When standalone, /implement runs end-to-end and reports results directly.
Ship working directory
All implementation artifacts are stored in tmp/ship/, derived dynamically from git rev-parse --show-toplevel. Override with CLAUDE_SHIP_DIR_OVERRIDE for Docker/CI/isolated envs. The implement.sh script resolves the directory automatically.
Inputs
| Input | Required | Default | Description |
|---|---|---|---|
| SPEC.md or spec.json path | Yes | — | Source artifact. When SPEC.md: used for Phase 1 conversion AND forwarded as iteration reference in Phase 2 (see Spec path forwarding). When spec.json: start at Phase 2 directly. |
--test-cmd |
No | pnpm test --run |
Test runner command for quality gates |
--typecheck-cmd |
No | pnpm typecheck |
Type checker command for quality gates |
--lint-cmd |
No | pnpm lint |
Linter command for quality gates |
--no-browser |
No | Browser assumed available | Omit "Verify in browser" criteria from UI stories; substitute with Bash-verifiable criteria |
--docker [compose-file] |
No | — | Use Docker for Phase 3 execution. Optionally accepts a path to the compose file (e.g., --docker .ai-dev/docker-compose.yml). When passed without a path, discovers the compose file automatically. When omitted entirely, execution runs on the host. |
When composed by /ship, these overrides are passed based on Phase 0 context detection. When running standalone, defaults apply.
Spec path forwarding
When a SPEC.md is provided (directly or by the invoker), the path persists beyond Phase 1 conversion. In Phase 2, the skill embeds a file-path reference in the implementation prompt so that iteration agents read the full SPEC.md as the first action of every iteration. The spec content is NOT embedded in the prompt — the prompt contains only the file path.
This is mandatory when a spec path is available. The spec contains critical implementation context that spec.json's implementationContext cannot fully capture:
- Non-goals (what NOT to build) — completely absent from spec.json
- Current state code traces (file paths, line numbers, existing patterns) — compressed to prose in implementationContext
- Decision rationale — reduced to conclusions only
- Risks and failure modes — only partially captured in acceptance criteria
When only spec.json is provided (no SPEC.md available), implementationContext serves as the sole implementation context source. See Phase 1's implementationContext guidance for how to calibrate depth.
Create workflow tasks (first action)
Before starting any work, create a task for each phase using TaskCreate with addBlockedBy to enforce ordering. Derive descriptions and completion criteria from each phase's own workflow text.
- Implement: Detect starting point
- Implement: Phase 1 — convert SPEC.md to spec.json
- Implement: Phase 2 — validate and craft prompt
- Implement: Phase 3 — execute iteration loop
Mark each task in_progress when starting and completed when its phase's exit criteria are met. On re-entry, check TaskList first and resume from the first non-completed task.
Detect starting point
| Condition | Begin at |
|---|---|
| SPEC.md exists, no spec.json | Phase 1 (Convert) |
| spec.json exists, needs validation/prompt | Phase 2 (Prepare) |
spec.json + tmp/ship/implement-prompt.md exist, ready to execute |
Phase 3 (Execute) |
| Called with only a conversion request | Phase 1, then stop |
Phase 1: Convert (SPEC.md to spec.json)
Take a SPEC.md and convert it to tmp/ship/spec.json. Create tmp/ship/ if it doesn't exist (mkdir -p tmp/ship).
Output format
{
"project": "[Project Name]",
"branchName": "implement/[feature-name-kebab-case]",
"description": "[Feature description from SPEC.md title/intro]",
"implementationContext": "[Concise prose summary of architecture, constraints, key decisions, and current state from the SPEC.md — everything the implementer needs to know that doesn't fit in individual stories]",
"userStories": [
{
"id": "US-001",
"title": "[Story title]",
"description": "As a [user], I want [feature] so that [benefit]",
"acceptanceCriteria": [
"Criterion 1",
"Criterion 2",
"Typecheck passes"
],
"priority": 1,
"passes": false,
"notes": ""
}
]
}
Story size: the number one rule
Each story must be completable in ONE iteration (one context window).
Each iteration receives the same prompt with no memory of previous work — only files and git history persist. If a story is too big, the LLM runs out of context before finishing and produces broken code.
Right-sized stories:
- Add a database column and migration
- Add a UI component to an existing page
- Update a server action with new logic
- Add a filter dropdown to a list
Too big (split these):
- "Build the entire dashboard" — Split into: schema, queries, UI components, filters
- "Add authentication" — Split into: schema, middleware, login UI, session handling
- "Refactor the API" — Split into one story per endpoint or pattern
Rule of thumb: If you cannot describe the change in 2-3 sentences, it is too big.
Story ordering: dependencies first
Stories execute in priority order. Earlier stories must not depend on later ones.
Correct order:
- Schema/database changes (migrations)
- Server actions / backend logic
- UI components that use the backend
- Dashboard/summary views that aggregate data
Wrong order:
- UI component (depends on schema that does not exist yet)
- Schema change
Acceptance criteria: must be verifiable
Each criterion must be something the iteration agent can CHECK, not something vague.
Good criteria (verifiable):
- "Add
statuscolumn to tasks table with default 'pending'" - "Filter dropdown has options: All, Active, Completed"
- "Clicking delete shows confirmation dialog"
- "Typecheck passes"
- "Tests pass"
Bad criteria (vague):
- "Works correctly"
- "User can do X easily"
- "Good UX"
- "Handles edge cases"
Implementation-coupled criteria (fragile):
- "handleStatusChange calls db.update with the correct enum"
- "Component renders by calling useTaskList hook"
- "API handler invokes validateInput before processing"
Behavioral criteria (resilient):
- "Task with changed status is retrievable with the new status"
- "Task list displays only tasks matching the selected filter"
- "Invalid status value returns 400 with descriptive error message"
Implementation-coupled criteria produce tests that break on refactor even when behavior is unchanged. Behavioral criteria produce tests that survive internal restructuring. See /tdd.
Always include as final criterion:
"Typecheck passes"
For stories with testable logic, also include:
"Tests pass"
For stories that change UI — if browser automation is available (no --no-browser flag):
"Verify in browser using browser skill"
Frontend stories are NOT complete until visually verified. For visual verification, prefer agent-browser (agent-browser open <url>, agent-browser screenshot) — it's lighter and faster for simple navigate-and-check flows. Load the /browser skill (Playwright scripts) when acceptance criteria warrant deeper verification: console error monitoring (startConsoleCapture / getConsoleErrors), network request verification (startNetworkCapture / getFailedRequests), or accessibility audits (runAccessibilityAudit).
If browser is NOT available (--no-browser): Omit the browser criterion. Instead, add Bash-verifiable criteria that cover the UI behavior through API responses or rendered output (e.g., "API response includes the updated status badge markup", "Server-rendered HTML contains filter dropdown with options: All, Active, Completed").
Conversion rules
- Each user story becomes one JSON entry
- IDs: Sequential (US-001, US-002, etc.)
- Priority: Based on dependency order, then document order
- All stories:
passes: falseand emptynotes - branchName: Derive from feature name, kebab-case, prefixed with
implement/ - Always add: "Typecheck passes" to every story's acceptance criteria
Extracting implementation context from the SPEC.md
The implementationContext field captures spec-level knowledge that applies across all stories — things the implementer needs every iteration but that don't belong in any single story's acceptance criteria.
Extract from these SPEC.md sections:
| SPEC.md section | What to extract | Why it matters |
|---|---|---|
| §9 Proposed solution — System design | Architecture overview, data model, API shape, auth/permissions model | Without this, the implementer guesses the architecture or contradicts the spec's design |
| §6 Non-functional requirements | Performance targets, security constraints, reliability requirements, operability needs | These constrain how every story is implemented, not what |
| §10 Decision log | Settled decisions (especially 1-way doors) with brief rationale | Prevents the implementer from revisiting or contradicting decisions made during the spec process |
| §8 Current state | How the system works today, key integration points, known gaps | The implementer needs to know what exists to integrate with it correctly |
What to write: A concise prose summary. Not a copy-paste of the spec sections — a distillation of what the implementer needs to hold in mind while working on every story.
Calibrate depth based on spec availability:
| Spec available during implementation? | implementationContext role | Recommended depth |
|---|---|---|
Yes — spec path forwarded to Phase 2 (default when composed by /ship or when user provides both) |
Quick orientation summary. The full SPEC.md provides deep context — iteration agents read it as step 1. | 3-5 sentences — architecture overview and key constraints only |
No — spec.json is the sole artifact (user invokes /implement with spec.json only, or spec is unavailable) |
Primary and sole implementation context source. Must stand on its own. | 5-10 sentences — include architecture, non-goals, current state integration points, key decisions with rationale, and critical constraints |
When in doubt about whether the spec will be available, write the longer form — it's never wrong to include more context, but the shorter form risks leaving the iteration agent under-informed.
Good example:
"The feature adds a
statuscolumn to the tasks table with an enum type. The API uses the existing RESTful pattern in/api/tasks/. Auth is handled by the existing tenant-scoped middleware — do not add new auth logic. The current task list fetches viagetTasksByProject()in the data-access layer; the new filter must use the same query pattern. Decision D3: we chose server-side filtering over client-side because the dataset can exceed 10k rows."
Bad example (too vague):
"Implement the task status feature following good practices."
Splitting large specs
If a SPEC.md has large features, split them:
Original:
"Add user notification system"
Split into:
- US-001: Add notifications table to database
- US-002: Create notification service for sending notifications
- US-003: Add notification bell icon to header
- US-004: Create notification dropdown panel
- US-005: Add mark-as-read functionality
- US-006: Add notification preferences page
Each is one focused change that can be completed and verified independently.
Converting failure paths into acceptance criteria
SPEC.md §5 (User journeys) includes failure/recovery paths and debug experience per persona. These are often the difference between a feature that works in demos and one that works in production.
Do not discard failure paths during conversion. For each failure scenario in the spec:
- Identify which story it belongs to — match the failure to the story that implements the relevant functionality.
- Convert it to a verifiable acceptance criterion on that story.
Example:
SPEC.md failure path:
Failure: User sets an invalid status value via API → System returns 400 with error message "Invalid status. Allowed values: pending, in_progress, done"
Becomes an acceptance criterion on the relevant story:
"API returns 400 with descriptive error when status value is not in [pending, in_progress, done]"
If a failure scenario spans multiple stories (e.g., "network error during save should show retry button"), attach the criterion to the story where the user-facing behavior lives (the UI story, not the backend story).
Applying non-functional requirements as cross-cutting criteria
SPEC.md §6 includes non-functional requirements: performance, reliability, security/privacy, operability, cost. These constrain how stories are implemented.
For each non-functional requirement in the spec:
- Determine if it's universally applicable (e.g., "all endpoints must validate tenant isolation") or story-specific (e.g., "list query must return in <200ms for 10k rows").
- Universal constraints: add as a criterion to every story they apply to.
- Story-specific constraints: add as a criterion to the relevant story only.
Examples:
| Non-functional requirement | Becomes criterion on |
|---|---|
| "All API endpoints must validate tenant isolation" | Every story that adds/modifies an API endpoint |
| "List query must paginate and return in <200ms" | The story that implements the list/filter |
| "Status changes must be audit-logged" | The story that implements the status toggle |
Do not create separate "non-functional" stories. These constraints should be woven into the stories that implement the relevant functionality.
Deriving qaScenarios (implementation context)
After converting user stories, derive a qaScenarios[] array from the SPEC.md's structured sections. These scenarios provide QA context for iteration agents during implementation — they help the implementer understand what will be verified. They do not constrain or scope /qa-plan — qa-plan derives its own scenarios directly from SPEC.md source material.
Each scenario follows the Given/When/Then format and traces back to one or more user stories.
Mapping table: SPEC.md section → scenario category → derivation rule
| SPEC.md section | Scenario category | Derivation rule |
|---|---|---|
| §6 Acceptance criteria | ux-flow, error-state |
Each criterion → one happy-path scenario; each failure condition → one error variant |
| §5 Interaction state matrix | visual, edge-case |
Each non-empty cell → one state verification scenario |
| §9 Data flow diagram — shadow paths | edge-case, failure-mode |
Each shadow path → one edge-case scenario |
| §9 Failure modes table | error-state, failure-mode |
Each row → one error-state scenario covering detection, recovery, and user impact |
| §9 Affected routes/pages | visual, ux-flow |
Each row → one visual/navigation verification scenario |
| §5 User journeys (happy + failure paths) | ux-flow, cross-system |
Each journey step → one e2e scenario; multi-service journeys → cross-system category |
| §13 Deployment/rollout considerations | integration |
Each row → one deployment/integration verification scenario |
oracleType assignment
Each scenario gets an oracleType that defines how pass/fail is determined:
| oracleType | When to use | Example |
|---|---|---|
specified |
Deterministic pass/fail — the spec defines the exact expected outcome | "API returns 400 with error message" |
derived |
Compare to a baseline or reference — correctness is relative, not absolute | "Page renders identically to the design mockup" |
human |
Subjective judgment required — no automated oracle exists | "Error message is helpful and actionable" |
Example
Input SPEC.md (abbreviated):
# Task Status Feature
Add ability to mark tasks with different statuses.
## Requirements
- Toggle between pending/in-progress/done on task list
- Filter list by status
- Show status badge on each task
- Persist status in database
## Non-functional requirements
- Status changes must be tenant-scoped
## User journeys — Failure paths
- Invalid status value via API → return 400 with descriptive error
## Current state
- Tasks stored in tasks table, accessed via getTasksByProject()
- API uses RESTful patterns under /api/tasks/
- UI uses TaskCard component in components/tasks/
Output spec.json:
{
"project": "TaskApp",
"branchName": "implement/task-status",
"description": "Task Status Feature - Track task progress with status indicators",
"implementationContext": "Tasks are stored in a tasks table accessed via getTasksByProject() in the data-access layer. The API follows RESTful patterns under /api/tasks/. Auth uses existing tenant-scoped middleware. The status field should be an enum column with a database-level constraint. UI components use the existing TaskCard component in components/tasks/.",
"userStories": [
{
"id": "US-001",
"title": "Add status field to tasks table",
"description": "As a developer, I need to store task status in the database.",
"acceptanceCriteria": [
"Add status column: 'pending' | 'in_progress' | 'done' (default 'pending')",
"Generate and run migration successfully",
"Typecheck passes"
],
"priority": 1,
"passes": false,
"notes": ""
},
{
"id": "US-002",
"title": "Display status badge on task cards",
"description": "As a user, I want to see task status at a glance.",
"acceptanceCriteria": [
"Each task card shows colored status badge",
"Badge colors: gray=pending, blue=in_progress, green=done",
"Typecheck passes",
"Verify in browser using browser skill"
],
"priority": 2,
"passes": false,
"notes": ""
},
{
"id": "US-003",
"title": "Add status toggle to task list rows",
"description": "As a user, I want to change task status directly from the list.",
"acceptanceCriteria": [
"Each row has status dropdown or toggle",
"Changing status saves immediately",
"UI updates without page refresh",
"API returns 400 with descriptive error when status value is not in [pending, in_progress, done]",
"Status update is tenant-scoped (uses existing tenant middleware)",
"Typecheck passes",
"Verify in browser using browser skill"
],
"priority": 3,
"passes": false,
"notes": ""
},
{
"id": "US-004",
"title": "Filter tasks by status",
"description": "As a user, I want to filter the list to see only certain statuses.",
"acceptanceCriteria": [
"Filter dropdown: All | Pending | In Progress | Done",
"Filter persists in URL params",
"Typecheck passes",
"Verify in browser using browser skill"
],
"priority": 4,
"passes": false,
"notes": ""
}
],
"qaScenarios": [
{
"id": "QA-001",
"priority": "P0",
"category": "ux-flow",
"name": "User can toggle task status from list",
"given": "A task exists with status 'pending' on the task list",
"when": "User changes status to 'in_progress' via the dropdown",
"then": "Status updates immediately, UI reflects the change without page refresh",
"tracesTo": ["US-003"],
"derivedFrom": "spec",
"oracleType": "specified",
"route": "/tasks"
},
{
"id": "QA-002",
"priority": "P0",
"category": "error-state",
"name": "Invalid status value returns descriptive error",
"given": "A task exists in the system",
"when": "API receives an invalid status value via PUT /api/tasks/:id",
"then": "API returns 400 with message 'Invalid status. Allowed values: pending, in_progress, done'",
"tracesTo": ["US-003"],
"derivedFrom": "spec",
"oracleType": "specified"
},
{
"id": "QA-003",
"priority": "P1",
"category": "visual",
"name": "Status badge colors match spec for all states",
"given": "Tasks exist with each status value (pending, in_progress, done)",
"when": "User views the task list",
"then": "Badges show correct colors: gray=pending, blue=in_progress, green=done",
"tracesTo": ["US-002"],
"derivedFrom": "spec",
"oracleType": "derived",
"route": "/tasks"
}
]
}
Archiving previous runs
Before writing a new spec.json, check if there is an existing one from a different feature:
- Read the current
tmp/ship/spec.jsonif it exists - Check if
branchNamediffers from the new feature's branch name - If different AND
tmp/ship/progress.txthas content beyond the header:- Create archive folder:
tmp/ship/archive/YYYY-MM-DD-feature-name/ - Copy current
tmp/ship/spec.jsonandtmp/ship/progress.txtto archive - Reset
tmp/ship/progress.txtwith fresh header
- Create archive folder:
Phase 1 checklist
Before writing spec.json, verify:
- Previous run archived (if
tmp/ship/spec.jsonexists with different branchName, archive it first) - Each story is completable in one iteration (small enough)
- Stories are ordered by dependency (schema to backend to UI)
- Every story has "Typecheck passes" as criterion
- UI stories have "Verify in browser using browser skill" as criterion (if browser available) or Bash-verifiable substitutes (if
--no-browser) - Acceptance criteria are verifiable and not vague; functional criteria describe observable behavior, not internal mechanisms (see /tdd)
- No story depends on a later story
-
implementationContextextracted from SPEC.md §8, §9, §10, §6 — concise prose, not a copy-paste - Failure/recovery paths from SPEC.md §5 converted into acceptance criteria on relevant stories
- Non-functional requirements from SPEC.md §6 applied as cross-cutting criteria where applicable
- qaScenarios[] derived from SPEC.md structured sections — each scenario has id (QA-NNN), given/when/then, tracesTo, and oracleType
Phase 2: Prepare
Validate the spec.json, craft the implementation prompt, and save it to a file for execution.
Validate spec.json against SPEC.md
Compare each user story to its corresponding requirement in the SPEC.md:
- Stories are correctly scoped (each completable in one iteration)
- Stories are properly ordered (dependencies first)
- Acceptance criteria are specific and verifiable
- No requirements from the SPEC.md are missing from spec.json
- No stories exceed what the SPEC.md calls for
Fix discrepancies before starting execution — errors here compound through every iteration.
Validate spec.json schema
If bun is available, run the schema validator:
bun <path-to-skill>/scripts/validate-spec.ts tmp/ship/spec.json
This checks structural integrity: required fields, ID format (US-NNN), sequential priorities, duplicate detection, and "Typecheck passes" criterion presence. Zero external dependencies — runs anywhere bun is installed.
If bun is not available, manually verify the spec.json structure matches the schema in Phase 1.
Place spec.json
Put spec.json at tmp/ship/spec.json. Create tmp/ship/ if it doesn't exist (mkdir -p tmp/ship).
Verify branch safety
If on main or master, warn before proceeding — the implement skill should normally run on a feature branch. If no branching model exists (e.g., container environment with no PR workflow), proceed with caution and ensure commits are isolated.
Craft the implementation prompt
Load: templates/implement-prompt.template.md
The template contains two complete prompt variants with {{PLACEHOLDER}} syntax. Choose ONE variant and fill all placeholders.
Choose variant:
- Variant A — when a SPEC.md path is available (directly provided or from Phase 1)
- Variant B — when only spec.json is available (no SPEC.md)
Conditionality lives HERE (in Phase 2 construction), NOT in the iteration prompt. The iteration agent sees a single, unconditional workflow — never both variants, never conditional "if spec is available" logic.
Fill {{CODEBASE_CONTEXT}}: Include the specific patterns, shared vocabulary, and abstractions in the area being modified — more actionable than generic CLAUDE.md guidance. Examples: "The API follows RESTful patterns under /api/tasks/", "Auth uses tenant-scoped middleware in auth.ts", "Data access uses the repository pattern in data-access/". Also include repo conventions from CLAUDE.md (testing patterns, file locations, formatting) that the iteration agent needs.
Fill quality gate commands: Use the commands from Inputs (defaults: pnpm typecheck, pnpm lint, pnpm test --run) — override with --typecheck-cmd, --lint-cmd, --test-cmd if provided.
Fill {{SPEC_PATH}} (Variant A only): Use a path relative to the working directory (e.g., .claude/specs/my-feature/SPEC.md). Relative paths work across execution contexts (host, Docker, worktree). Do NOT use absolute paths — they break when the prompt is executed in a different environment. Do NOT embed spec content in the prompt — the iteration agent reads it via the Read tool each iteration.
Save codebase context as standalone file
Also save the filled {{CODEBASE_CONTEXT}} content to tmp/ship/codebase-context.md so it is available to downstream consumers (fix prompt, reviewer):
mkdir -p tmp/ship
# Write codebase context as standalone file (same content that gets inlined in implement-prompt.md)
Clean stale execution artifacts
Remove generated artifacts from prior runs before writing new ones. These files are fully replaced every run — the Write tool's "file not read yet" guard blocks overwrites of existing files, causing a stall and retry cycle.
rm -f tmp/ship/implement-prompt.md tmp/ship/codebase-context.md
Save the prompt
Save the crafted implementation prompt to tmp/ship/implement-prompt.md. This file is consumed by Phase 3 (scripts/implement.sh) for automated execution, or by the user for manual iteration (claude -p).
Copy implement.sh for Docker execution (Docker only)
When --docker was passed, copy scripts/implement.sh to tmp/ship/implement.sh so the container can access it via bind mount:
cp <path-to-skill>/scripts/implement.sh tmp/ship/implement.sh
chmod +x tmp/ship/implement.sh
Skip this step for host execution. Phase 3 on host invokes scripts/implement.sh directly from the skill directory — no copy needed. The copy exists only for Docker containers that cannot access the plugin cache path (see references/execution.md).
Phase 2 checklist
- spec.json validated against SPEC.md (if available)
- spec.json schema validated (via
scripts/validate-spec.tsif bun available) - Stories correctly scoped, ordered, and with verifiable criteria
- Implementation prompt crafted from template with correct variant (A or B)
- All
{{PLACEHOLDERS}}filled (spec path, quality gates, codebase context) - Completion signal present in saved prompt (included in template — verify not accidentally removed)
- Codebase context saved to
tmp/ship/codebase-context.md - Prompt saved to
tmp/ship/implement-prompt.md -
implement.shcopied totmp/ship/implement.shand made executable (Docker only — skip for host execution)
Phase 3: Execute
Run the iteration loop via scripts/implement.sh. Each iteration spawns a fresh Claude Code subprocess — full capabilities, zero shared context between iterations.
Load: references/execution.md
Step 1: Ensure dependencies and build environment
Before starting iterations, ensure the working directory has a functioning development environment. Iteration agents need deps installed and the build working to run quality gates.
- Detect the package manager from
package.jsonpackageManagerfield (e.g.,pnpm@10.10.0). If a specific version is pinned, usenpx <pm>@<version> installto avoid lockfile mismatches. - Install dependencies if
node_modules/is missing or stale:# Example for pnpm-pinned repos: npx pnpm@<version> install - Run conductor setup if
conductor.jsonexists in the repo root. - Verify the build is clean:
If typecheck fails, this is a pre-existing issue — log it but do not block. The iteration loop may fix it.<typecheck-cmd> # e.g., pnpm typecheck
This step is idempotent — if deps are already installed and the build is clean, it completes instantly. Skip entirely if running inside Docker (--docker), where the container image provides the environment.
Step 2: Probe for Claude CLI
Check if automated execution is possible:
env -u CLAUDECODE -u CLAUDE_CODE_ENTRYPOINT claude --version
If this fails, automated execution is not available — skip to the fallback below.
Step 3: Choose execution context (host vs Docker)
If --docker was NOT passed: Execute on the host (default). Proceed to Step 4.
If --docker was passed: Use Docker for execution.
-
Resolve the compose file. If a path was provided (e.g.,
--docker .ai-dev/docker-compose.yml), use it. Otherwise, discover it: search the repo for**/docker-compose.ymlor**/compose.ymlfiles whose content defines asandboxservice. Use the first match. If none found, error: "No compose file with a sandbox service found in this repo." -
Ensure the container is running:
docker compose -f <compose-file> ps --status running sandboxIf not running, start it:
docker compose -f <compose-file> up -d. -
Proceed — Step 6 uses the Docker invocation variant.
Step 4: Pre-execution quality gate baseline
Before starting the iteration loop, run the quality gates to establish a baseline:
<typecheck-cmd> # e.g., pnpm typecheck
<lint-cmd> # e.g., pnpm lint
<test-cmd> # e.g., pnpm test --run
If any gate fails, warn the operator: "Quality gates are failing before implementation starts. Pre-existing failures will cost iterations to diagnose. Consider fixing them first."
Log the baseline to tmp/ship/progress.txt regardless of result:
## Pre-execution baseline - [timestamp]
- Typecheck: PASS/FAIL
- Lint: PASS/FAIL
- Test: PASS/FAIL
Do not block execution — the operator may be running /implement specifically to fix failures. But the baseline log helps iteration agents distinguish pre-existing failures from regressions they introduced.
Step 5: Invoke implement.sh
Run in background to avoid the Bash tool's 600-second timeout. Do not set --max-iterations — let the models run uncapped. The iteration loop has natural stop conditions: all stories pass (completion signal), stuck stories (move on after 3 failed runs), and context exhaustion (subprocess exits, next iteration starts fresh). Artificial caps cut off runs that are making progress.
Host execution (default):
Bash(command: "<path-to-skill>/scripts/implement.sh --force",
run_in_background: true,
description: "Implement execution run 1")
Docker execution (when --docker was passed — compose file resolved in Step 3):
Bash(command: "docker compose -f <compose-file> exec sandbox tmp/ship/implement.sh --force",
run_in_background: true,
description: "Implement Docker execution run 1")
Always pass --force — background execution has no TTY for interactive prompts.
The background Bash call returns a task ID and output file path. You will receive a <task-notification> automatically when implement.sh completes — do NOT poll with TaskOutput (deprecated). While waiting for the notification, do lightweight work (re-read spec, review task list) but do NOT make code changes that could conflict. If you need to check progress mid-run, Read the output file path or Read tmp/ship/progress.txt directly.
Implementation is slow — each iteration spawns a full Claude Code subprocess that works through a user story. Expected durations:
| Feature complexity | Expected duration |
|---|---|
| Small (1-3 stories) | 10-20 minutes total |
| Medium (4-8 stories) | 30-60 minutes total |
| Large (9+ stories) | 60-120 minutes total |
Step 7: Assess results
When implement.sh completes, read tmp/ship/spec.json and tmp/ship/progress.txt:
- All stories
passes: true→ execution succeeded. Proceed to Phase 3 checklist. - Some stories incomplete → check
tmp/ship/progress.txtfor blockers. Apply stuck story handling (seereferences/execution.md), then re-invoke implement.sh for another run. - implement.sh exited with error → read output for details. If transient (network, rate limit), retry once. If persistent, fall back to manual instructions.
Step 8: Stuck story handling (between runs)
If the same story fails across 2 consecutive implement.sh runs with the same blocker:
- Story too large → split into smaller stories in spec.json
- Criteria ambiguous → rewrite criteria to be more specific
- External dependency blocking → skip the story, set
notesexplaining the blocker - Wrong implementation approach → add guidance to
tmp/ship/progress.txtsuggesting an alternative - Code defect blocking the story (test fails, runtime error, type error the iteration agent can't resolve) → Load
/debugskill to diagnose the root cause between runs. Apply the fix, then re-invoke implement.sh.
After 3 consecutive failed runs on the same story, stop and consult the user.
Re-invoke implement.sh after applying remediation. Maximum 3 total implement.sh runs before escalating to the user.
Fallback: Claude CLI not available
If the Claude CLI probe in Step 1 failed, automated execution is not possible. /implement still provides full value through Phases 1-2 — the artifacts are ready.
Tell the user:
- The implementation prompt has been saved to
tmp/ship/implement-prompt.md - For manual execution using the iteration loop, copy the script and run it:
cp <path-to-skill>/scripts/implement.sh tmp/ship/implement.sh && chmod +x tmp/ship/implement.sh tmp/ship/implement.sh --force
Phase 3 checklist
- All stories in
tmp/ship/spec.jsonhavepasses: true(or stuck stories documented withnotes) -
tmp/ship/progress.txtreviewed — no unresolved blockers - Quality gates pass: typecheck, lint, test
After Phase 3:
- If composed by
/ship: Ship continues with post-implementation review and testing. - If standalone: read every file created or modified. The implementation output is your starting point, not your endpoint. Run quality gates manually.
More from inkeep/team-skills
qa
Manual QA testing — verify features end-to-end as a user would, by all means necessary. Exhausts every local tool: browser (Playwright), Docker, ad-hoc scripts, REPL, dev servers. Mock-aware — mocked test coverage does not count. Proves real userOutcome at highest achievable fidelity. Blocked scenarios flow to /pr as pending human verification. Standalone or composable with /ship. Triggers: qa, qa test, manual test, test the feature, verify it works, exploratory testing, smoke test, end-to-end verification.
61cold-email
Generate cold emails for B2B personas. Use when asked to write cold outreach, sales emails, or prospect messaging. Supports 19 persona archetypes (Founder-CEO, CTO, VP Engineering, CIO, CPO, Product Directors, VP CX, Head of Support, Support Ops, DevRel, Head of Docs, Technical Writer, Head of Community, VP Growth, Head of AI, etc.). Can generate first-touch and follow-up emails. When a LinkedIn profile URL is provided, uses Crustdata MCP to enrich prospect data (name, title, company, career history, recent posts) for deep personalization.
54spec
Drive an evidence-driven, iterative product+engineering spec process that produces a full PRD + technical spec (often as SPEC.md). Use when scoping a feature or product surface area end-to-end; defining requirements; researching external/internal prior art; mapping current system behavior; comparing design options; making 1-way-door decisions; negotiating scope; and maintaining a live Decision Log + Open Questions backlog. Triggers: spec, PRD, proposal, technical spec, RFC, scope this, design doc, end-to-end requirements, scope plan, tradeoffs, open questions.
54ship
Orchestrate any code change from requirements to review-ready branch — scope-calibrated from small fixes to full features. Composes /spec, /implement, and /research with depth that scales to the task: lightweight spec and direct implementation for bug fixes and config changes, full rigor for features. Produces tested, locally reviewed, documented code on a feature branch. The developer pushes the branch and creates the PR. Use for ALL implementation work regardless of perceived scope — the workflow adapts depth, never skips phases. Triggers: ship, ship it, feature development, implement end to end, spec to PR, implement this, fix this, let's implement, let's go with that, build this, make the change, full stack implementation, autonomous development.
52docs
Write or update documentation for engineering changes — both product-facing (user docs, API reference, guides) and internal (architecture docs, runbooks, inline code docs). Builds a world model of what changed and traces transitive documentation consequences across all affected surfaces. Discovers and uses repo-specific documentation skills, style guides, and conventions. Standalone or composable with /ship. Triggers: docs, documentation, write docs, update docs, document the changes, product docs, internal docs, changelog, migration guide.
52write-agent
Design and write high-quality Claude Code agents and agent prompts. Use when creating or updating .claude/agents/*.md for (1) single-purpose subagents (reviewers, implementers, researchers) and (2) workflow orchestrators (multi-phase coordinators like pr-review, feature-development, bug-fix). Covers delegation triggers, tool/permission/model choices, Task-tool orchestration, phase handoffs, aggregation, iteration gates, and output contracts. Also use when deciding between subagents vs skills vs always-on repo guidance.
50