implement

Installation

SKILL.md

Implement

Convert a SPEC.md into implementation-ready artifacts and execute the iteration loop. This skill operates in three phases:

Convert — Transform SPEC.md into a structured spec.json
Prepare — Validate the spec.json, craft the implementation prompt, save it to a file
Execute — Run scripts/implement.sh to iterate through user stories via subprocess

Each phase can be entered independently. If you already have a spec.json, start at Phase 2. If artifacts are ready and you need execution, start at Phase 3. If you only need conversion, stop after Phase 1.

When composed by /ship, Ship invokes /implement for the full lifecycle and reviews the output afterward. When standalone, /implement runs end-to-end and reports results directly.

Ship working directory

All implementation artifacts are stored in tmp/ship/, derived dynamically from git rev-parse --show-toplevel. Override with CLAUDE_SHIP_DIR_OVERRIDE for Docker/CI/isolated envs. The implement.sh script resolves the directory automatically.

Inputs

Input	Required	Default	Description
SPEC.md or spec.json path	Yes	—	Source artifact. When SPEC.md: used for Phase 1 conversion AND forwarded as iteration reference in Phase 2 (see Spec path forwarding). When spec.json: start at Phase 2 directly.
`--test-cmd`	No	`pnpm test --run`	Test runner command for quality gates
`--typecheck-cmd`	No	`pnpm typecheck`	Type checker command for quality gates
`--lint-cmd`	No	`pnpm lint`	Linter command for quality gates
`--no-browser`	No	Browser assumed available	Omit "Verify in browser" criteria from UI stories; substitute with Bash-verifiable criteria
`--docker [compose-file]`	No	—	Use Docker for Phase 3 execution. Optionally accepts a path to the compose file (e.g., `--docker .ai-dev/docker-compose.yml`). When passed without a path, discovers the compose file automatically. When omitted entirely, execution runs on the host.

When composed by /ship, these overrides are passed based on Phase 0 context detection. When running standalone, defaults apply.

Spec path forwarding

When a SPEC.md is provided (directly or by the invoker), the path persists beyond Phase 1 conversion. In Phase 2, the skill embeds a file-path reference in the implementation prompt so that iteration agents read the full SPEC.md as the first action of every iteration. The spec content is NOT embedded in the prompt — the prompt contains only the file path.

This is mandatory when a spec path is available. The spec contains critical implementation context that spec.json's implementationContext cannot fully capture:

Non-goals (what NOT to build) — completely absent from spec.json
Current state code traces (file paths, line numbers, existing patterns) — compressed to prose in implementationContext
Decision rationale — reduced to conclusions only
Risks and failure modes — only partially captured in acceptance criteria

When only spec.json is provided (no SPEC.md available), implementationContext serves as the sole implementation context source. See Phase 1's implementationContext guidance for how to calibrate depth.

Create workflow tasks (first action)

Before starting any work, create a task for each phase using TaskCreate with addBlockedBy to enforce ordering. Derive descriptions and completion criteria from each phase's own workflow text.

Implement: Detect starting point
Implement: Phase 1 — convert SPEC.md to spec.json
Implement: Phase 2 — validate and craft prompt
Implement: Phase 3 — execute iteration loop

Mark each task in_progress when starting and completed when its phase's exit criteria are met. On re-entry, check TaskList first and resume from the first non-completed task.

Detect starting point

Condition	Begin at
SPEC.md exists, no spec.json	Phase 1 (Convert)
spec.json exists, needs validation/prompt	Phase 2 (Prepare)
spec.json + `tmp/ship/implement-prompt.md` exist, ready to execute	Phase 3 (Execute)
Called with only a conversion request	Phase 1, then stop

Phase 1: Convert (SPEC.md to spec.json)

Take a SPEC.md and convert it to tmp/ship/spec.json. Create tmp/ship/ if it doesn't exist (mkdir -p tmp/ship).

Output format

{
  "project": "[Project Name]",
  "branchName": "implement/[feature-name-kebab-case]",
  "description": "[Feature description from SPEC.md title/intro]",
  "implementationContext": "[Concise prose summary of architecture, constraints, key decisions, and current state from the SPEC.md — everything the implementer needs to know that doesn't fit in individual stories]",
  "userStories": [
    {
      "id": "US-001",
      "title": "[Story title]",
      "description": "As a [user], I want [feature] so that [benefit]",
      "acceptanceCriteria": [
        "Criterion 1",
        "Criterion 2",
        "Typecheck passes"
      ],
      "priority": 1,
      "passes": false,
      "notes": ""
    }
  ]
}

Story size: the number one rule

Each story must be completable in ONE iteration (one context window).

Each iteration receives the same prompt with no memory of previous work — only files and git history persist. If a story is too big, the LLM runs out of context before finishing and produces broken code.

Right-sized stories:

Add a database column and migration
Add a UI component to an existing page
Update a server action with new logic
Add a filter dropdown to a list

Too big (split these):

"Build the entire dashboard" — Split into: schema, queries, UI components, filters
"Add authentication" — Split into: schema, middleware, login UI, session handling
"Refactor the API" — Split into one story per endpoint or pattern

Rule of thumb: If you cannot describe the change in 2-3 sentences, it is too big.

Story ordering: dependencies first

Stories execute in priority order. Earlier stories must not depend on later ones.

Correct order:

Schema/database changes (migrations)
Server actions / backend logic
UI components that use the backend
Dashboard/summary views that aggregate data

Wrong order:

UI component (depends on schema that does not exist yet)
Schema change

Acceptance criteria: must be verifiable

Each criterion must be something the iteration agent can CHECK, not something vague.

Good criteria (verifiable):

"Add status column to tasks table with default 'pending'"
"Filter dropdown has options: All, Active, Completed"
"Clicking delete shows confirmation dialog"
"Typecheck passes"
"Tests pass"

Bad criteria (vague):

"Works correctly"
"User can do X easily"
"Good UX"
"Handles edge cases"

Implementation-coupled criteria (fragile):

"handleStatusChange calls db.update with the correct enum"
"Component renders by calling useTaskList hook"
"API handler invokes validateInput before processing"

Behavioral criteria (resilient):

"Task with changed status is retrievable with the new status"
"Task list displays only tasks matching the selected filter"
"Invalid status value returns 400 with descriptive error message"

Implementation-coupled criteria produce tests that break on refactor even when behavior is unchanged. Behavioral criteria produce tests that survive internal restructuring. See /tdd.

Always include as final criterion:

"Typecheck passes"

For stories with testable logic, also include:

"Tests pass"

For stories that change UI — if browser automation is available (no --no-browser flag):

"Verify in browser using browser skill"

Frontend stories are NOT complete until visually verified. For visual verification, prefer agent-browser (agent-browser open <url>, agent-browser screenshot) — it's lighter and faster for simple navigate-and-check flows. Load the /browser skill (Playwright scripts) when acceptance criteria warrant deeper verification: console error monitoring (startConsoleCapture / getConsoleErrors), network request verification (startNetworkCapture / getFailedRequests), or accessibility audits (runAccessibilityAudit).

If browser is NOT available (--no-browser): Omit the browser criterion. Instead, add Bash-verifiable criteria that cover the UI behavior through API responses or rendered output (e.g., "API response includes the updated status badge markup", "Server-rendered HTML contains filter dropdown with options: All, Active, Completed").

Conversion rules

Each user story becomes one JSON entry
IDs: Sequential (US-001, US-002, etc.)
Priority: Based on dependency order, then document order
All stories: passes: false and empty notes
branchName: Derive from feature name, kebab-case, prefixed with implement/
Always add: "Typecheck passes" to every story's acceptance criteria

Extracting implementation context from the SPEC.md

The implementationContext field captures spec-level knowledge that applies across all stories — things the implementer needs every iteration but that don't belong in any single story's acceptance criteria.

Extract from these SPEC.md sections:

SPEC.md section	What to extract	Why it matters
§9 Proposed solution — System design	Architecture overview, data model, API shape, auth/permissions model	Without this, the implementer guesses the architecture or contradicts the spec's design
§6 Non-functional requirements	Performance targets, security constraints, reliability requirements, operability needs	These constrain how every story is implemented, not what
§10 Decision log	Settled decisions (especially 1-way doors) with brief rationale	Prevents the implementer from revisiting or contradicting decisions made during the spec process
§8 Current state	How the system works today, key integration points, known gaps	The implementer needs to know what exists to integrate with it correctly

What to write: A concise prose summary. Not a copy-paste of the spec sections — a distillation of what the implementer needs to hold in mind while working on every story.

Calibrate depth based on spec availability:

Spec available during implementation?	implementationContext role	Recommended depth
Yes — spec path forwarded to Phase 2 (default when composed by `/ship` or when user provides both)	Quick orientation summary. The full SPEC.md provides deep context — iteration agents read it as step 1.	3-5 sentences — architecture overview and key constraints only
No — spec.json is the sole artifact (user invokes `/implement` with spec.json only, or spec is unavailable)	Primary and sole implementation context source. Must stand on its own.	5-10 sentences — include architecture, non-goals, current state integration points, key decisions with rationale, and critical constraints

When in doubt about whether the spec will be available, write the longer form — it's never wrong to include more context, but the shorter form risks leaving the iteration agent under-informed.

Good example:

"The feature adds a status column to the tasks table with an enum type. The API uses the existing RESTful pattern in /api/tasks/. Auth is handled by the existing tenant-scoped middleware — do not add new auth logic. The current task list fetches via getTasksByProject() in the data-access layer; the new filter must use the same query pattern. Decision D3: we chose server-side filtering over client-side because the dataset can exceed 10k rows."

Bad example (too vague):

"Implement the task status feature following good practices."

Splitting large specs

If a SPEC.md has large features, split them:

Original:

"Add user notification system"

Split into:

US-001: Add notifications table to database
US-002: Create notification service for sending notifications
US-003: Add notification bell icon to header
US-004: Create notification dropdown panel
US-005: Add mark-as-read functionality
US-006: Add notification preferences page

Each is one focused change that can be completed and verified independently.

Converting failure paths into acceptance criteria

SPEC.md §5 (User journeys) includes failure/recovery paths and debug experience per persona. These are often the difference between a feature that works in demos and one that works in production.

Do not discard failure paths during conversion. For each failure scenario in the spec:

Identify which story it belongs to — match the failure to the story that implements the relevant functionality.
Convert it to a verifiable acceptance criterion on that story.

Example:

SPEC.md failure path:

Failure: User sets an invalid status value via API → System returns 400 with error message "Invalid status. Allowed values: pending, in_progress, done"

Becomes an acceptance criterion on the relevant story:

"API returns 400 with descriptive error when status value is not in [pending, in_progress, done]"

If a failure scenario spans multiple stories (e.g., "network error during save should show retry button"), attach the criterion to the story where the user-facing behavior lives (the UI story, not the backend story).

Applying non-functional requirements as cross-cutting criteria

SPEC.md §6 includes non-functional requirements: performance, reliability, security/privacy, operability, cost. These constrain how stories are implemented.

For each non-functional requirement in the spec:

Determine if it's universally applicable (e.g., "all endpoints must validate tenant isolation") or story-specific (e.g., "list query must return in <200ms for 10k rows").
Universal constraints: add as a criterion to every story they apply to.
Story-specific constraints: add as a criterion to the relevant story only.

Examples:

Non-functional requirement	Becomes criterion on
"All API endpoints must validate tenant isolation"	Every story that adds/modifies an API endpoint
"List query must paginate and return in <200ms"	The story that implements the list/filter
"Status changes must be audit-logged"	The story that implements the status toggle

Do not create separate "non-functional" stories. These constraints should be woven into the stories that implement the relevant functionality.

Deriving qaScenarios (implementation context)

After converting user stories, derive a qaScenarios[] array from the SPEC.md's structured sections. These scenarios provide QA context for iteration agents during implementation — they help the implementer understand what will be verified. They do not constrain or scope /qa-plan — qa-plan derives its own scenarios directly from SPEC.md source material.

Each scenario follows the Given/When/Then format and traces back to one or more user stories.

Mapping table: SPEC.md section → scenario category → derivation rule

SPEC.md section	Scenario category	Derivation rule
§6 Acceptance criteria	`ux-flow`, `error-state`	Each criterion → one happy-path scenario; each failure condition → one error variant
§5 Interaction state matrix	`visual`, `edge-case`	Each non-empty cell → one state verification scenario
§9 Data flow diagram — shadow paths	`edge-case`, `failure-mode`	Each shadow path → one edge-case scenario
§9 Failure modes table	`error-state`, `failure-mode`	Each row → one error-state scenario covering detection, recovery, and user impact
§9 Affected routes/pages	`visual`, `ux-flow`	Each row → one visual/navigation verification scenario
§5 User journeys (happy + failure paths)	`ux-flow`, `cross-system`	Each journey step → one e2e scenario; multi-service journeys → cross-system category
§13 Deployment/rollout considerations	`integration`	Each row → one deployment/integration verification scenario

oracleType assignment

Each scenario gets an oracleType that defines how pass/fail is determined:

oracleType	When to use	Example
`specified`	Deterministic pass/fail — the spec defines the exact expected outcome	"API returns 400 with error message"
`derived`	Compare to a baseline or reference — correctness is relative, not absolute	"Page renders identically to the design mockup"
`human`	Subjective judgment required — no automated oracle exists	"Error message is helpful and actionable"

Example

Input SPEC.md (abbreviated):

# Task Status Feature

Add ability to mark tasks with different statuses.

## Requirements
- Toggle between pending/in-progress/done on task list
- Filter list by status
- Show status badge on each task
- Persist status in database

## Non-functional requirements
- Status changes must be tenant-scoped

## User journeys — Failure paths
- Invalid status value via API → return 400 with descriptive error

## Current state
- Tasks stored in tasks table, accessed via getTasksByProject()
- API uses RESTful patterns under /api/tasks/
- UI uses TaskCard component in components/tasks/

Output spec.json:

{
  "project": "TaskApp",
  "branchName": "implement/task-status",
  "description": "Task Status Feature - Track task progress with status indicators",
  "implementationContext": "Tasks are stored in a tasks table accessed via getTasksByProject() in the data-access layer. The API follows RESTful patterns under /api/tasks/. Auth uses existing tenant-scoped middleware. The status field should be an enum column with a database-level constraint. UI components use the existing TaskCard component in components/tasks/.",
  "userStories": [
    {
      "id": "US-001",
      "title": "Add status field to tasks table",
      "description": "As a developer, I need to store task status in the database.",
      "acceptanceCriteria": [
        "Add status column: 'pending' | 'in_progress' | 'done' (default 'pending')",
        "Generate and run migration successfully",
        "Typecheck passes"
      ],
      "priority": 1,
      "passes": false,
      "notes": ""
    },
    {
      "id": "US-002",
      "title": "Display status badge on task cards",
      "description": "As a user, I want to see task status at a glance.",
      "acceptanceCriteria": [
        "Each task card shows colored status badge",
        "Badge colors: gray=pending, blue=in_progress, green=done",
        "Typecheck passes",
        "Verify in browser using browser skill"
      ],
      "priority": 2,
      "passes": false,
      "notes": ""
    },
    {
      "id": "US-003",
      "title": "Add status toggle to task list rows",
      "description": "As a user, I want to change task status directly from the list.",
      "acceptanceCriteria": [
        "Each row has status dropdown or toggle",
        "Changing status saves immediately",
        "UI updates without page refresh",
        "API returns 400 with descriptive error when status value is not in [pending, in_progress, done]",
        "Status update is tenant-scoped (uses existing tenant middleware)",
        "Typecheck passes",
        "Verify in browser using browser skill"
      ],
      "priority": 3,
      "passes": false,
      "notes": ""
    },
    {
      "id": "US-004",
      "title": "Filter tasks by status",
      "description": "As a user, I want to filter the list to see only certain statuses.",
      "acceptanceCriteria": [
        "Filter dropdown: All | Pending | In Progress | Done",
        "Filter persists in URL params",
        "Typecheck passes",
        "Verify in browser using browser skill"
      ],
      "priority": 4,
      "passes": false,
      "notes": ""
    }
  ],
  "qaScenarios": [
    {
      "id": "QA-001",
      "priority": "P0",
      "category": "ux-flow",
      "name": "User can toggle task status from list",
      "given": "A task exists with status 'pending' on the task list",
      "when": "User changes status to 'in_progress' via the dropdown",
      "then": "Status updates immediately, UI reflects the change without page refresh",
      "tracesTo": ["US-003"],
      "derivedFrom": "spec",
      "oracleType": "specified",
      "route": "/tasks"
    },
    {
      "id": "QA-002",
      "priority": "P0",
      "category": "error-state",
      "name": "Invalid status value returns descriptive error",
      "given": "A task exists in the system",
      "when": "API receives an invalid status value via PUT /api/tasks/:id",
      "then": "API returns 400 with message 'Invalid status. Allowed values: pending, in_progress, done'",
      "tracesTo": ["US-003"],
      "derivedFrom": "spec",
      "oracleType": "specified"
    },
    {
      "id": "QA-003",
      "priority": "P1",
      "category": "visual",
      "name": "Status badge colors match spec for all states",
      "given": "Tasks exist with each status value (pending, in_progress, done)",
      "when": "User views the task list",
      "then": "Badges show correct colors: gray=pending, blue=in_progress, green=done",
      "tracesTo": ["US-002"],
      "derivedFrom": "spec",
      "oracleType": "derived",
      "route": "/tasks"
    }
  ]
}

Archiving previous runs

Before writing a new spec.json, check if there is an existing one from a different feature:

Read the current tmp/ship/spec.json if it exists
Check if branchName differs from the new feature's branch name
If different AND tmp/ship/progress.txt has content beyond the header:
- Create archive folder: tmp/ship/archive/YYYY-MM-DD-feature-name/
- Copy current tmp/ship/spec.json and tmp/ship/progress.txt to archive
- Reset tmp/ship/progress.txt with fresh header

Phase 1 checklist

Before writing spec.json, verify:

Phase 2: Prepare

Validate the spec.json, craft the implementation prompt, and save it to a file for execution.

Validate spec.json against SPEC.md

Compare each user story to its corresponding requirement in the SPEC.md:

Stories are correctly scoped (each completable in one iteration)
Stories are properly ordered (dependencies first)
Acceptance criteria are specific and verifiable
No requirements from the SPEC.md are missing from spec.json
No stories exceed what the SPEC.md calls for

Fix discrepancies before starting execution — errors here compound through every iteration.

Validate spec.json schema

If bun is available, run the schema validator:

bun <path-to-skill>/scripts/validate-spec.ts tmp/ship/spec.json

This checks structural integrity: required fields, ID format (US-NNN), sequential priorities, duplicate detection, and "Typecheck passes" criterion presence. Zero external dependencies — runs anywhere bun is installed.

If bun is not available, manually verify the spec.json structure matches the schema in Phase 1.

Place spec.json

Put spec.json at tmp/ship/spec.json. Create tmp/ship/ if it doesn't exist (mkdir -p tmp/ship).

Verify branch safety

If on main or master, warn before proceeding — the implement skill should normally run on a feature branch. If no branching model exists (e.g., container environment with no PR workflow), proceed with caution and ensure commits are isolated.

Craft the implementation prompt

Load: templates/implement-prompt.template.md

The template contains two complete prompt variants with {{PLACEHOLDER}} syntax. Choose ONE variant and fill all placeholders.

Choose variant:

Variant A — when a SPEC.md path is available (directly provided or from Phase 1)
Variant B — when only spec.json is available (no SPEC.md)

Conditionality lives HERE (in Phase 2 construction), NOT in the iteration prompt. The iteration agent sees a single, unconditional workflow — never both variants, never conditional "if spec is available" logic.

Fill {{CODEBASE_CONTEXT}}: Include the specific patterns, shared vocabulary, and abstractions in the area being modified — more actionable than generic CLAUDE.md guidance. Examples: "The API follows RESTful patterns under /api/tasks/", "Auth uses tenant-scoped middleware in auth.ts", "Data access uses the repository pattern in data-access/". Also include repo conventions from CLAUDE.md (testing patterns, file locations, formatting) that the iteration agent needs.

Fill quality gate commands: Use the commands from Inputs (defaults: pnpm typecheck, pnpm lint, pnpm test --run) — override with --typecheck-cmd, --lint-cmd, --test-cmd if provided.

Fill {{SPEC_PATH}} (Variant A only): Use a path relative to the working directory (e.g., .claude/specs/my-feature/SPEC.md). Relative paths work across execution contexts (host, Docker, worktree). Do NOT use absolute paths — they break when the prompt is executed in a different environment. Do NOT embed spec content in the prompt — the iteration agent reads it via the Read tool each iteration.

Save codebase context as standalone file

Also save the filled {{CODEBASE_CONTEXT}} content to tmp/ship/codebase-context.md so it is available to downstream consumers (fix prompt, reviewer):

mkdir -p tmp/ship
# Write codebase context as standalone file (same content that gets inlined in implement-prompt.md)

Clean stale execution artifacts

Remove generated artifacts from prior runs before writing new ones. These files are fully replaced every run — the Write tool's "file not read yet" guard blocks overwrites of existing files, causing a stall and retry cycle.

rm -f tmp/ship/implement-prompt.md tmp/ship/codebase-context.md

Save the prompt

Save the crafted implementation prompt to tmp/ship/implement-prompt.md. This file is consumed by Phase 3 (scripts/implement.sh) for automated execution, or by the user for manual iteration (claude -p).

Copy implement.sh for Docker execution (Docker only)

When --docker was passed, copy scripts/implement.sh to tmp/ship/implement.sh so the container can access it via bind mount:

cp <path-to-skill>/scripts/implement.sh tmp/ship/implement.sh
chmod +x tmp/ship/implement.sh

Skip this step for host execution. Phase 3 on host invokes scripts/implement.sh directly from the skill directory — no copy needed. The copy exists only for Docker containers that cannot access the plugin cache path (see references/execution.md).

Phase 2 checklist

spec.json validated against SPEC.md (if available)
spec.json schema validated (via scripts/validate-spec.ts if bun available)
Stories correctly scoped, ordered, and with verifiable criteria
Implementation prompt crafted from template with correct variant (A or B)
All {{PLACEHOLDERS}} filled (spec path, quality gates, codebase context)
Completion signal present in saved prompt (included in template — verify not accidentally removed)
Codebase context saved to tmp/ship/codebase-context.md
Prompt saved to tmp/ship/implement-prompt.md
implement.sh copied to tmp/ship/implement.sh and made executable (Docker only — skip for host execution)

Phase 3: Execute

Run the iteration loop via scripts/implement.sh. Each iteration spawns a fresh Claude Code subprocess — full capabilities, zero shared context between iterations.

Load: references/execution.md

Step 1: Ensure dependencies and build environment

Before starting iterations, ensure the working directory has a functioning development environment. Iteration agents need deps installed and the build working to run quality gates.

Detect the package manager from package.json packageManager field (e.g., pnpm@10.10.0). If a specific version is pinned, use npx <pm>@<version> install to avoid lockfile mismatches.

Install dependencies if node_modules/ is missing or stale:

# Example for pnpm-pinned repos:
npx pnpm@<version> install

Run conductor setup if conductor.json exists in the repo root.
Verify the build is clean:
```
<typecheck-cmd>  # e.g., pnpm typecheck
```
If typecheck fails, this is a pre-existing issue — log it but do not block. The iteration loop may fix it.

This step is idempotent — if deps are already installed and the build is clean, it completes instantly. Skip entirely if running inside Docker (--docker), where the container image provides the environment.

Step 2: Probe for Claude CLI

Check if automated execution is possible:

env -u CLAUDECODE -u CLAUDE_CODE_ENTRYPOINT claude --version

If this fails, automated execution is not available — skip to the fallback below.

Step 3: Choose execution context (host vs Docker)

If --docker was NOT passed: Execute on the host (default). Proceed to Step 4.

If --docker was passed: Use Docker for execution.

Resolve the compose file. If a path was provided (e.g., --docker .ai-dev/docker-compose.yml), use it. Otherwise, discover it: search the repo for **/docker-compose.yml or **/compose.yml files whose content defines a sandbox service. Use the first match. If none found, error: "No compose file with a sandbox service found in this repo."
Ensure the container is running:
```
docker compose -f <compose-file> ps --status running sandbox
```
If not running, start it: docker compose -f <compose-file> up -d.
Proceed — Step 6 uses the Docker invocation variant.

Step 4: Pre-execution quality gate baseline

Before starting the iteration loop, run the quality gates to establish a baseline:

<typecheck-cmd>  # e.g., pnpm typecheck
<lint-cmd>       # e.g., pnpm lint
<test-cmd>       # e.g., pnpm test --run

If any gate fails, warn the operator: "Quality gates are failing before implementation starts. Pre-existing failures will cost iterations to diagnose. Consider fixing them first."

Log the baseline to tmp/ship/progress.txt regardless of result:

## Pre-execution baseline - [timestamp]
- Typecheck: PASS/FAIL
- Lint: PASS/FAIL
- Test: PASS/FAIL

Do not block execution — the operator may be running /implement specifically to fix failures. But the baseline log helps iteration agents distinguish pre-existing failures from regressions they introduced.

Step 5: Invoke implement.sh

Run in background to avoid the Bash tool's 600-second timeout. Do not set --max-iterations — let the models run uncapped. The iteration loop has natural stop conditions: all stories pass (completion signal), stuck stories (move on after 3 failed runs), and context exhaustion (subprocess exits, next iteration starts fresh). Artificial caps cut off runs that are making progress.

Host execution (default):

Bash(command: "<path-to-skill>/scripts/implement.sh --force",
     run_in_background: true,
     description: "Implement execution run 1")

Docker execution (when --docker was passed — compose file resolved in Step 3):

Bash(command: "docker compose -f <compose-file> exec sandbox tmp/ship/implement.sh --force",
     run_in_background: true,
     description: "Implement Docker execution run 1")

Always pass --force — background execution has no TTY for interactive prompts.

The background Bash call returns a task ID and output file path. You will receive a <task-notification> automatically when implement.sh completes — do NOT poll with TaskOutput (deprecated). While waiting for the notification, do lightweight work (re-read spec, review task list) but do NOT make code changes that could conflict. If you need to check progress mid-run, Read the output file path or Read tmp/ship/progress.txt directly.

Implementation is slow — each iteration spawns a full Claude Code subprocess that works through a user story. Expected durations:

Feature complexity	Expected duration
Small (1-3 stories)	10-20 minutes total
Medium (4-8 stories)	30-60 minutes total
Large (9+ stories)	60-120 minutes total

Step 7: Assess results

When implement.sh completes, read tmp/ship/spec.json and tmp/ship/progress.txt:

All stories passes: true → execution succeeded. Proceed to Phase 3 checklist.
Some stories incomplete → check tmp/ship/progress.txt for blockers. Apply stuck story handling (see references/execution.md), then re-invoke implement.sh for another run.
implement.sh exited with error → read output for details. If transient (network, rate limit), retry once. If persistent, fall back to manual instructions.

Step 8: Stuck story handling (between runs)

If the same story fails across 2 consecutive implement.sh runs with the same blocker:

Story too large → split into smaller stories in spec.json
Criteria ambiguous → rewrite criteria to be more specific
External dependency blocking → skip the story, set notes explaining the blocker
Wrong implementation approach → add guidance to tmp/ship/progress.txt suggesting an alternative
Code defect blocking the story (test fails, runtime error, type error the iteration agent can't resolve) → Load /debug skill to diagnose the root cause between runs. Apply the fix, then re-invoke implement.sh.

After 3 consecutive failed runs on the same story, stop and consult the user.

Re-invoke implement.sh after applying remediation. Maximum 3 total implement.sh runs before escalating to the user.

Fallback: Claude CLI not available

If the Claude CLI probe in Step 1 failed, automated execution is not possible. /implement still provides full value through Phases 1-2 — the artifacts are ready.

Tell the user:

The implementation prompt has been saved to tmp/ship/implement-prompt.md

For manual execution using the iteration loop, copy the script and run it:

cp <path-to-skill>/scripts/implement.sh tmp/ship/implement.sh && chmod +x tmp/ship/implement.sh
tmp/ship/implement.sh --force

Phase 3 checklist

All stories in tmp/ship/spec.json have passes: true (or stuck stories documented with notes)
tmp/ship/progress.txt reviewed — no unresolved blockers
Quality gates pass: typecheck, lint, test

After Phase 3:

If composed by /ship: Ship continues with post-implementation review and testing.
If standalone: read every file created or modified. The implementation output is your starting point, not your endpoint. Run quality gates manually.

Related skills

More from inkeep/team-skills

Installs

Repository

inkeep/team-skills

GitHub Stars

First Seen

Feb 18, 2026

Security Audits

Gen Agent Trust HubFail

SocketWarn

SnykPass