agent-teams
Agent Teams Workflow
When to Use
Use this skill when coordinating multiple Claude Code agents to implement features in parallel using the Agent Teams feature. Covers:
- Feature doc format and stack-aware lifecycle
- Three-role separation: test-writer, builder, reviewer (order depends on stack)
- File ownership rules to prevent conflicts
- Hook-based quality gates (TaskCompleted, TeammateIdle, Stop)
- Fast verification for rapid feedback, full verification for completion gates
- Progress dashboard (
feature-docs/STATUS.md) for zero-context recovery - Stuck detection and time blindness mitigation
- Coordination protocol with kickoff prompts
- Bootstrap and retrofit prompts for new and existing projects
Defer to other skills for:
- git-workflow skill: Branch naming, commit message conventions, PR creation
- testing-playwright skill: Frontend E2E test patterns (Playwright-specific)
- testing-pytest skill: Python test patterns (pytest-specific)
- testing-rust skill: Rust test patterns (cargo test-specific)
This workflow is adapted from Anthropic's "Building a C compiler with a team of parallel Claudes" (Feb 2026). The key insight: the quality of the testing harness determines the quality of the output.
1. Settings Configuration
Add to .claude/settings.json:
{
"$schema": "https://json.schemastore.org/claude-code-settings.json",
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
},
"teammateMode": "tmux"
}
| Setting | Values | What it does |
|---|---|---|
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS |
"1" |
Enables the agent teams feature |
teammateMode |
"auto", "tmux", "in-process" |
Controls how teammates are displayed |
Display modes:
auto(default) — uses split panes if already in tmux, in-process otherwisetmux— forces split-pane mode; each teammate gets its own tmux panein-process— all teammates share the main terminal; useShift+Downto cycle between them
Override per-session: claude --teammate-mode in-process
2. Core Principles
Verification Oracle (Stack-Dependent)
The workflow uses different verification strategies depending on the stack:
Python/Rust — Tests as Oracle (TDD): The test-writer agent reads feature docs and writes failing tests. The builder agent implements code to make those tests pass. Nobody grades their own homework — the agent that writes tests never writes implementation, and the agent that implements never modifies tests.
Frontend — Interface as Oracle (Build-First): For frontend projects, the user-visible interface is the stable contract — not internal component APIs. The builder implements directly from the feature doc's acceptance criteria. The test-writer then writes Playwright E2E tests that verify the implementation matches the spec. Tests should PASS (not fail). Vibe-coded UIs change constantly — components get restructured, hooks get refactored, state management evolves. Unit tests against internal APIs break with every refactor. But the user-facing behavior (what they click, what they see) is stable. E2E tests verify that stable contract.
In both models, separation of concerns is preserved: the agent that builds never writes tests, and the agent that writes tests never modifies implementation.
Minimal Context Pollution
LLMs degrade as context fills with irrelevant information. Every hook and agent instruction is designed to produce minimal, structured output:
- Test results print summary lines only, not full stack traces
- Errors use a consistent format:
ERROR [CATEGORY]: one-line description - Verbose output goes to
agent_logs/, never to stdout scripts/verify.shlogs full output toagent_logs/and pipes throughtail -10- Agent test commands use quiet reporters (
-q, no--reporter=verbose) - Stop hook truncates output to 20 lines
Fast Verification
The Stop hook runs scripts/fast-verify.sh (type check only) on every response
where files changed. This catches type errors quickly without running the full
suite. The full verify pipeline (scripts/verify.sh) runs only on TaskCompleted.
This mirrors Carlini's --fast mode: quick smoke checks during work,
comprehensive validation only at completion gates.
Time Blindness Mitigation
LLMs cannot self-regulate time. The TeammateIdle hook detects features stuck
in building/ for over 30 minutes (using file modification time) and warns the
user. This prevents agents from spinning indefinitely on hard problems.
Progress Dashboard
Agents start each session with zero context. feature-docs/STATUS.md is updated
by every agent after each stage transition. It shows what's in flight, what's
blocked, and what's done — enabling any agent to orient quickly.
File Ownership
Feature docs declare which files each feature affects. No agent touches files owned by another in-progress feature. This prevents the problem Carlini identified: agents hitting the same bug, fixing it, and overwriting each other's changes.
Ownership is convention-based (declared in feature doc frontmatter), not
technically enforced. Agents must check feature-docs/testing/ and
feature-docs/building/ for overlapping affected-files before starting work.
CI as Regression Gate
The TaskCompleted hook runs the full verify pipeline (scripts/verify.sh)
before any task can be marked done. An agent cannot ship code that breaks existing
tests. This is enforced at the hook level (deterministic) rather than in prompts
(probabilistic).
Human-in-the-Loop for Subjective Work
Tests verify functional correctness, but some decisions are subjective. For frontend projects, visual/style work requires human review loops with screenshots. The workflow splits into:
- Feature work: Fully autonomous. Human writes spec, agents handle the pipeline (frontend: build → E2E test → review, Python/Rust: test → build → review).
- Style work (frontend only): Human-in-the-loop. Agent makes changes, generates screenshots, pauses for human feedback. Approved screenshots become visual regression baselines.
3. Team Lifecycle
Step 1 — Create a Team
One team per feature or work unit. Creates config at ~/.claude/teams/{team-name}/
and task list at ~/.claude/tasks/{team-name}/.
TeamCreate { team_name: "feat-user-auth" }
Step 2 — Spawn Teammates
Use the Agent tool with team_name to add teammates. Each spawned teammate
appears in its own tmux pane automatically.
Agent {
team_name: "feat-user-auth",
name: "test-writer",
subagent_type: "test-writer",
prompt: "Pick up feature-docs/ready/003-user-auth.md",
mode: "auto"
}
| Parameter | Required | Purpose |
|---|---|---|
team_name |
Yes | Which team this teammate joins |
name |
Yes | Human-readable name for messaging and task assignment |
subagent_type |
Yes | Agent type — custom agents from .claude/agents/ or built-in types |
prompt |
Yes | The task description / instructions |
mode |
No | Permission mode ("auto" for autonomous, "plan" for approval) |
Step 3 — Coordinate with Tasks
Create structured work items that teammates can claim and track:
TaskCreate {
subject: "Write failing tests for auth module",
description: "Read feature doc acceptance criteria, write pytest tests..."
}
Assign and track:
TaskUpdate { taskId: "1", owner: "test-writer", status: "in_progress" }
TaskUpdate { taskId: "2", addBlockedBy: ["1"] }
Step 4 — Communicate
Send direct messages to teammates:
SendMessage {
type: "message",
recipient: "test-writer",
content: "Tests look good. Moving to builder phase.",
summary: "Tests approved"
}
Broadcast to all (use sparingly — costs scale with team size):
SendMessage {
type: "broadcast",
content: "Blocking issue found — stop all work.",
summary: "Critical blocker found"
}
Step 5 — Shut Down and Clean Up
Gracefully terminate each teammate, then delete the team:
SendMessage {
type: "shutdown_request",
recipient: "test-writer",
content: "All tasks complete, shutting down."
}
After all teammates have shut down:
TeamDelete {}
4. Ideation Phase (Pre-Ready)
Before a feature enters the agent pipeline, it goes through an ideation phase where
the human explores, researches, and shapes the idea. Source feature-docs/new-feature.md
to start (or resume) the guided workflow.
Ideation happens in feature-docs/ideation/ with one subfolder per feature:
feature-docs/ideation/
CLAUDE.md # Auto-discovered guide for all ideation folders
001-user-auth/
README.md # Status tracking + progress log
code-review.md # Analysis of existing code to change
api-research.md # How other projects solve this
design-notes.md # Data flow, component tree, schema
spike-results.md # Quick experiments
002-cart-redesign/
README.md
current-analysis.md
competitor-notes.md
Starting or Resuming Ideation
Source feature-docs/new-feature.md — it handles both cases:
- New feature: Asks what you want to build, creates the ideation folder, walks you through validation (code review, research, design), saves artifacts as you go
- Resume: Scans for folders with
status: in-progress, reads all artifacts, summarises where you left off, continues from open questions
Status Tracking
Each ideation folder's README.md has YAML frontmatter:
---
feature: user-auth
status: in-progress # or: complete, shipped
created: 2025-01-15
---
The ## Progress section tracks dated entries across sessions:
### 2025-01-15 — Initial exploration
- **What we did**: Reviewed existing auth code, identified session management gap
- **Decisions made**: Use httpOnly cookies, not localStorage
- **Open questions**: Which OAuth provider to use later?
### 2025-01-16 — API design
- **What we did**: Designed login/logout endpoints, drafted store structure
- **Decisions made**: Separate auth store from user profile store
- **Open questions**: How to handle token refresh?
What Goes in an Ideation Folder
There are no format rules — use whatever helps you think:
- Code reviews — Analysis of existing code the feature will touch
- Research notes — API docs, how other projects solve this, trade-offs
- Design sketches — Data flow diagrams, component trees, schema changes
- Spike results — Quick experiments to validate an approach
- Conversation logs — Key decisions and reasoning from Claude sessions
Distilling into a Feature Doc
When the feature is clear enough to write testable acceptance criteria, say "create the feature" during your ideation session. The prompt will:
- Read all files in the ideation folder
- Synthesise the summary from across all artifacts
- Extract testable behaviours as GIVEN/WHEN/THEN acceptance criteria
- Identify affected files from code reviews and design notes
- Flag gaps (missing error cases, unresolved decisions, no affected files)
- Save the final doc to
feature-docs/ready/<feature-name>.md - Set
ideation-refin the feature doc frontmatter pointing back to the ideation folder - Update the ideation README status to
complete
The ideation folder stays as an archive. Agents never read ideation folders — only
the distilled feature doc in ready/. The ideation-ref field lets agents optionally
check the ideation folder for additional context.
When the feature later completes the full pipeline (reviewer approves, doc moves to
completed/), the coordinator updates the ideation README status from complete to
shipped and appends a final progress entry noting pipeline completion. This is handled
by the coordinator's "After reviewer approves" checklist in implement-feature.md.
Alternatively, if you already know what you want and want to skip ideation, source
feature-docs/new-feature.md and choose "skip to feature doc" when prompted — it
handles both paths (ideation and direct creation) from a single entry point.
5. Feature Doc Format
Feature docs live in feature-docs/ with subdirectories for each lifecycle stage.
Create this directory structure in your project:
feature-docs/
ideation/ # Human explores and shapes ideas here
ready/ # Distilled feature doc goes here
testing/ # Test-writer moves doc here
building/ # Builder moves doc here
review/ # Builder moves doc here when tests pass
completed/ # Reviewer moves doc here when done
Template
---
title: User Authentication
status: ready
priority: high
depends-on: 004-session-management
affected-files:
- src/auth/authenticate.ts
- src/auth/session.ts
- src/stores/auth-store.ts
- src/components/login-form.tsx
---
# User Authentication
## Summary
Add email/password login with session management. Users can log in, stay
authenticated across page reloads, and log out.
## Acceptance Criteria
1. GIVEN a valid email and password WHEN `authenticate(email, password)` is called
THEN it returns a `Session` with a non-null `token` and `expiresAt` > now
2. GIVEN an email with no matching user WHEN `authenticate(email, password)` is
called THEN it throws `AuthenticationError` with code `"INVALID_CREDENTIALS"`
3. GIVEN `authStore.getState().isAuthenticated` is `true` WHEN `logout()` is called
THEN `authStore.getState().session` is `null` and the session cookie is cleared
4. GIVEN a session cookie with a valid token WHEN `restoreSession()` is called
THEN `authStore.getState().isAuthenticated` is `true`
5. GIVEN a session cookie with an expired token WHEN `restoreSession()` is called
THEN `authStore.getState().session` is `null` and the cookie is cleared
## Edge Cases
- Empty email or password to `authenticate()` — throws `ValidationError` with
code `"EMPTY_FIELD"` before any network request
- Session cookie with malformed JSON — `restoreSession()` clears the cookie
silently without throwing
## Out of Scope
- OAuth/social login (separate feature) — do NOT add OAuth types to `Session`
- Do NOT touch `src/api/client.ts` interceptor (has a `TODO: add auth` comment;
leave as-is to avoid breaking existing API calls)
## Technical Notes
- Session token uses httpOnly cookie, not localStorage
- **Rejected**: localStorage with encryption wrapper — XSS-accessible, no real
protection. httpOnly cookies are invisible to JS entirely.
Acceptance Criteria Rules
Every acceptance criterion must be:
- Testable — can be verified by an automated test
- Specific — names exact functions, fields, error types, and return values
- Independent — does not depend on other criteria passing first
- Complete — covers the happy path, error cases, and edge cases
Vague criteria produce vague tests produce wrong implementations.
Feature Dependencies
Features can declare a dependency on one other feature using the depends-on
frontmatter field. The value is the filename stem of the dependency (e.g.,
005-user-auth).
One level per doc: Each feature declares only its immediate parent. Feature
006 says depends-on: 005-session-mgmt. Feature 005 says depends-on: 004-data-layer. The full chain (006 → 005 → 004) is resolved dynamically at
check time — no feature stores the entire chain.
Recursive resolution: The scripts/check-deps.sh script walks the chain
from the target feature all the way down. If ANY dependency in the chain is not
in completed/, the feature is BLOCKED and must not be picked up.
Blocking behavior:
- In
TeammateIdlehooks: blocked features are skipped. The hook continues searching for unblocked work. - In agent pickup (builder/test-writer): agents check dependencies before starting. If blocked, they report to the user and stop.
- In
implement-feature.mdcoordinator flow: the pre-flight check warns the user and asks whether to wait or override.
Circular dependency detection: The script tracks visited features and exits with an error if a cycle is found (e.g., A → B → A).
When to use depends-on:
- Feature B cannot function without Feature A's code being merged (runtime dependency)
- Feature B's acceptance criteria reference outputs from Feature A
- Feature B modifies files that Feature A creates (sequential file ownership)
When NOT to use depends-on:
- Features that merely share a domain but are independently testable
- Priority ordering (use
priority: high/medium/lowinstead) - Features that could run in parallel with non-overlapping files
| Vague (agent has to guess) | Precise (agent can write a test) |
|---|---|
| THEN the login works | THEN authenticate() returns a Session with non-null token |
| THEN an error is shown | THEN it throws AuthenticationError with code "INVALID_CREDENTIALS" |
| THEN the data is saved | THEN authStore.getState().session contains the Session |
| THEN the field is removed | THEN the returned object does NOT include a legacyField key |
6. Agent Roles
Test Writer
Purpose: Produce tests that verify the feature doc's acceptance criteria.
Frontend (build-first):
- Reads: Feature doc from
feature-docs/testing/+ builder's implementation - Produces: Playwright E2E tests that PASS — no Vitest unit tests
- Tests verify the user-visible interface against acceptance criteria
- If a test fails, the builder has a bug (report it, don't work around it)
- Moves doc:
testing/ → review/
Python/Rust (TDD):
- Reads: Feature doc from
feature-docs/ready/ - Produces: Test files that FAIL (all tests must fail before handing off)
- Tests import from implementation paths even though files may not exist yet
- Moves doc:
ready/ → testing/
Shared constraints:
- Never writes implementation code — only test files
- Each acceptance criterion produces at least one test
- Edge cases from the feature doc produce additional tests
- Commits tests with
test(<scope>): add [failing] tests for <feature-name>
Builder
Purpose: Write implementation code for the feature.
Frontend (build-first):
- Reads: Feature doc from
feature-docs/ready/ - Produces: Implementation code directly from acceptance criteria
- Creates the feature branch
- Moves doc:
ready/ → building/ → testing/
Python/Rust (TDD):
- Reads: Feature doc from
feature-docs/testing/, failing test files - Produces: Implementation code that makes all tests pass
- Moves doc:
testing/ → building/ → review/
Shared constraints:
- Never modifies test files — if tests are wrong, stop and report to the user
- Must run
scripts/verify.shafter implementation - Only touches files listed in the feature doc's
affected-files - Commits implementation with
feat(<scope>): implement <feature-name>
Reviewer
Purpose: Catch what tests cannot — code quality, convention adherence, design system consistency, and qualitative issues.
Maps to: The existing code-reviewer universal agent, extended with
agent-teams awareness.
Checks:
- Code follows project conventions (CLAUDE.md rules)
- No duplicate logic introduced
- Error handling is complete
- Types are correct and specific (no
any, nounwrapin production paths) - Component library used correctly (shadcn for frontend, idiomatic patterns for backend)
- Feature doc acceptance criteria all have corresponding tests
- Tests actually validate the criteria (not just trivially passing)
Produces: Review report. If issues found, status stays at review.
If approved, reviewer moves doc to feature-docs/completed/.
Constraints:
- Strictly read-only — never edits implementation or test files
- Never uses Bash to modify files (
sed -i,echo >, etc.) - Reports issues to the coordinator; the coordinator routes fixes to the appropriate agent
- Independence is the reviewer's value — if the reviewer fixes code, it cannot objectively review it
Coordinator
Purpose: Orchestrate the pipeline — scan for work, run pre-flight checks, invoke agents, verify lifecycle compliance between stages, and manage the progress dashboard. The coordinator never writes implementation or test code.
Identity: The main Claude Code session that sources implement-feature.md.
Unlike the other roles, the coordinator is not a named agent with restricted
tools — it has full tool access by default. These constraints are self-imposed
through prompt instructions.
Reads: Feature docs (all directories), STATUS.md, verify output, agent reports
Produces: Team lifecycle management, feature doc lifecycle moves, STATUS.md updates
Allowed operations:
- Read, Grep, Glob, and read-only Bash on any file
TeamCreate,Agent,SendMessage,TeamDeletefor team lifecycleTaskCreate/TaskUpdatefor tracking work itemssedon feature doc frontmatter (status:field only)mvto move feature docs between lifecycle directories- Write/Edit on
feature-docs/STATUS.mdonly
Constraints:
- Never uses Write, Edit, or sed on files listed in
affected-files - Never uses Write, Edit, or sed on test files
- Never uses Write, Edit, or sed on any implementation/source file
- When code needs fixing, re-invokes the responsible agent with specific error details
- When tests are wrong, reports to the user or re-invokes the test-writer
7. Feature Doc Lifecycle
Frontend (Build-First)
Human explores idea → (feature-docs/ideation/<name>/)
└─ Code reviews, research, design notes, spikes
Human distills doc → status: ready (feature-docs/ready/)
Builder picks up → status: building (feature-docs/building/)
└─ Implements from acceptance criteria on feature branch
Builder finishes → status: testing (feature-docs/testing/)
└─ All verification passes, implementation complete
Test-writer picks up → status: testing (feature-docs/testing/)
└─ Writes passing Playwright E2E tests
Test-writer finishes → status: review (feature-docs/review/)
└─ E2E tests pass, verify clean
Reviewer validates → status: done (feature-docs/completed/)
└─ Approved by reviewer
Coordinator merges → PR created and merged to main
└─ Returns to main, ready for next feature
Python/Rust (TDD)
Human explores idea → (feature-docs/ideation/<name>/)
└─ Code reviews, research, design notes, spikes
Human distills doc → status: ready (feature-docs/ready/)
Test-writer picks up → status: testing (feature-docs/testing/)
└─ Failing tests committed on feature branch
Builder picks up → status: building (feature-docs/building/)
└─ Implements until all tests pass
Builder finishes → status: review (feature-docs/review/)
└─ All tests + verify pass
Reviewer validates → status: done (feature-docs/completed/)
└─ Approved by reviewer
Coordinator merges → PR created and merged to main
└─ Returns to main, ready for next feature
Status Transitions
Frontend (Build-First):
| From | To | Who | Action |
|---|---|---|---|
| ready | building | builder | Move doc, create branch, implement from spec |
| building | testing | builder | Move doc, verify passes, implementation done |
| testing | review | test-writer | Move doc, E2E tests written and passing |
| review | completed | reviewer | Move doc, approve quality |
| review | testing | reviewer | Move doc back, E2E test gaps found |
| review | building | reviewer | Move doc back, implementation issues found |
Python/Rust (TDD):
| From | To | Who | Action |
|---|---|---|---|
| ready | testing | test-writer | Move doc, write failing tests, commit |
| testing | building | builder | Move doc, begin implementation |
| building | testing | builder | BOUNCE: defective tests, create bounce file |
| building | review | builder | Move doc, all tests pass, verify clean |
| review | completed | reviewer | Move doc, approve quality |
| review | building | reviewer | Move doc back, issues found (re-work) |
The status field in the feature doc frontmatter and the directory location must stay in sync. Moving the file IS the status transition.
Branch Strategy
Each feature gets its own branch: feat/<feature-name> (following git-workflow
skill conventions).
- The first agent checks out main and pulls before creating the branch
- All agents commit on the same branch
- Reviewer reviews the branch
- After reviewer approval, the coordinator creates a PR (
gh pr create) and merges it (gh pr merge --squash --delete-branch) - The coordinator returns to main (
git checkout main && git pull) before the next feature starts - This ensures each new feature branches from the latest main, not from a previous unmerged feature
Naming Convention
Feature doc filenames use a 3-digit numeric prefix: NNN-feature-name.md
(e.g., 001-user-auth.md, 002-cart-redesign.md). The prefix is assigned at
creation time by running scripts/next-feature-number.sh, which scans all
lifecycle directories and ideation folders for existing prefixes and returns
the next available number. Ideation folders use the same prefix (e.g.,
ideation/001-user-auth/). The numeric prefix carries through the entire
lifecycle — the same file that starts as ready/001-user-auth.md becomes
testing/001-user-auth.md, then building/, review/, and completed/.
This prevents confusion between similarly-named features. 001-user-auth.md
can never be mistaken for 002-user-auth-v2.md.
8. Coordination Protocol
Automated Kickoff
Source feature-docs/implement-feature.md to scan ready/ for available
features, run pre-flight checks (section completeness, file ownership conflicts,
dependency chain), detect the stack, and kick off the first agent (builder for
frontend, test-writer for Python/Rust). The TeammateIdle hook handles
subsequent handoffs automatically.
Dependency awareness: Before kicking off any feature, the coordinator checks
its dependency chain via scripts/check-deps.sh. If the feature has unmet
dependencies, the coordinator warns the user and suggests waiting or proceeding
with an override. The TeammateIdle hook automatically skips blocked features
when scanning for pending work.
Sequential Pipeline — Frontend (Build-First)
# 1. Create team
TeamCreate { team_name: "feat-user-auth" }
# 2. Spawn builder
Agent {
team_name: "feat-user-auth",
name: "builder",
subagent_type: "builder",
prompt: "Pick up feature-docs/ready/001-user-auth.md",
mode: "auto"
}
# 3. Wait for builder to finish (TeammateIdle notification)
# 4. Shut down builder
SendMessage { type: "shutdown_request", recipient: "builder" }
# 5. Spawn test-writer for E2E tests
Agent {
team_name: "feat-user-auth",
name: "test-writer",
subagent_type: "test-writer",
prompt: "Pick up feature-docs/testing/001-user-auth.md",
mode: "auto"
}
# 6. Wait for test-writer to finish
# 7. Shut down test-writer, spawn reviewer
SendMessage { type: "shutdown_request", recipient: "test-writer" }
Agent {
team_name: "feat-user-auth",
name: "reviewer",
subagent_type: "code-reviewer",
prompt: "Review feature-docs/review/001-user-auth.md",
mode: "auto"
}
# 8. Wait for reviewer, then merge and clean up
# Create PR and merge to main
gh pr create --base main --head "feat/user-auth" --title "feat(auth): user authentication" --body "..."
gh pr merge --squash --delete-branch
# Return to main
git checkout main
git pull origin main
# Clean up the team
SendMessage { type: "shutdown_request", recipient: "reviewer" }
TeamDelete {}
Sequential Pipeline — Python/Rust (TDD)
# 1. Create team
TeamCreate { team_name: "feat-config" }
# 2. Spawn test-writer (writes failing tests first)
Agent {
team_name: "feat-config",
name: "test-writer",
subagent_type: "test-writer",
prompt: "Pick up feature-docs/ready/003-config.md",
mode: "auto"
}
# 3. Wait for test-writer to finish (TeammateIdle notification)
# 4. Shut down test-writer
SendMessage { type: "shutdown_request", recipient: "test-writer" }
# 5. Spawn builder
Agent {
team_name: "feat-config",
name: "builder",
subagent_type: "builder",
prompt: "Pick up feature-docs/testing/003-config.md",
mode: "auto"
}
# 6. Wait for builder to finish
# 7. Shut down builder, spawn reviewer
SendMessage { type: "shutdown_request", recipient: "builder" }
Agent {
team_name: "feat-config",
name: "reviewer",
subagent_type: "code-reviewer",
prompt: "Review feature-docs/review/003-config.md",
mode: "auto"
}
# 8. Wait for reviewer, then merge and clean up
# Create PR and merge to main
gh pr create --base main --head "feat/config" --title "feat(config): configuration system" --body "..."
gh pr merge --squash --delete-branch
# Return to main
git checkout main
git pull origin main
# Clean up the team
SendMessage { type: "shutdown_request", recipient: "reviewer" }
TeamDelete {}
Python/Rust: Test Bounce-Back (builder → test-writer → builder)
If the builder detects defective tests (wrong assertions, missing pytest.raises,
tests that contradict the feature doc), it moves the feature doc back to testing/,
creates a bounce file (<name>.bounce.md), and exits. The coordinator detects this
and re-invokes the test-writer in fix mode.
Detection: After the builder finishes (TeammateIdle or manual check), check
whether it bounced — the feature doc will be in testing/ (not review/):
ls feature-docs/testing/<filename>.bounce.md
If a bounce file exists:
-
Check bounce count: Read the
bounce-countfrom the feature doc frontmatter. If it is 3 or higher, escalate to the user — the problem is likely in the acceptance criteria, not test mechanics. -
Re-invoke the test-writer in fix mode:
Agent { team_name: "feat-<feature-name>", name: "test-writer", subagent_type: "test-writer", prompt: "Fix defective tests per feature-docs/testing/<filename>.bounce.md", mode: "auto" } -
Wait for the test-writer to complete, then re-invoke the builder:
SendMessage { type: "shutdown_request", recipient: "test-writer" } Agent { team_name: "feat-<feature-name>", name: "builder", subagent_type: "builder", prompt: "Pick up feature-docs/testing/<filename>.md — tests have been fixed after bounce-back.", mode: "auto" }
Circuit breaker: When bounce-count reaches 3, escalate to the user. Do not re-invoke agents automatically — the issue likely requires revising the feature doc's acceptance criteria.
Concurrency Rules
- Same-role parallelism is allowed. The coordinator may launch multiple
Agentcalls simultaneously, each working on a different piece or a different feature. - Cross-role parallelism is forbidden. Builders and testers must never run at the same time. Complete ALL agents of one role before starting the next role.
- Clean shutdown between roles. Send
shutdown_requestto each teammate and verify all agents of the current role have fully stopped before spawning the next role. Teammates finish their current turn before exiting.
Parallel Workflow (Multiple Features)
For multiple features in parallel, ensure no affected-files overlap.
Use the stack-appropriate first agent (builder for frontend, test-writer for
Python/Rust). If features share files, run them sequentially to avoid conflicts.
Parallel Investigation
Spawn multiple teammates to explore in parallel:
TeamCreate { team_name: "investigate-perf" }
Agent {
team_name: "investigate-perf",
name: "db-investigator",
subagent_type: "general-purpose",
prompt: "Investigate database query performance in src/db/",
mode: "auto"
}
Agent {
team_name: "investigate-perf",
name: "api-investigator",
subagent_type: "general-purpose",
prompt: "Investigate API endpoint latency in src/api/",
mode: "auto"
}
TeammateIdle Hook
When a teammate finishes work and goes idle, the TeammateIdle hook scans
feature-docs/ for pending work and logs what it finds. The hook always
exits 0, allowing the agent session to terminate cleanly. The coordinator
is responsible for launching fresh agent sessions for the next role.
Frontend (build-first) scan priority:
feature-docs/testing/— Needs test-writer for E2E testsfeature-docs/ready/— Needs builder to implementfeature-docs/review/— Needs reviewer
Python/Rust (TDD) scan priority:
feature-docs/testing/— Failing tests exist, needs a builderfeature-docs/ready/— Feature doc waiting, needs a test-writerfeature-docs/review/— Implementation done, needs a reviewer
The hook logs pending work to stderr for the coordinator's awareness, but does not redirect the idle agent. This prevents finished agents from lingering and interfering with the next role's file changes.
TaskCompleted Hook
When any teammate tries to mark a task as done, the TaskCompleted hook runs
two checks:
1. Lifecycle compliance — Scans all feature docs in ready/, testing/,
building/, review/, and completed/. For each doc with a status: field,
verifies the value matches the directory name. If any feature doc is in the wrong
directory (e.g., still in ready/ when it should be in testing/), the task is
blocked. This prevents agents from skipping the doc-move step.
2. Full verify pipeline:
- Type checking (tsc / mypy / cargo check)
- Linting (eslint / ruff / clippy)
- Tests (vitest / pytest / cargo test)
If either check fails, the task cannot be marked done. The agent sees the error output and must fix the issue before trying again.
9. File Ownership Rules
Claiming Files
When an agent picks up a feature doc, the affected-files list in the
frontmatter declares which files that agent may modify. Before starting:
- Read all feature docs in
feature-docs/testing/andfeature-docs/building/ - Collect their
affected-fileslists - Check for overlap with the current feature's
affected-files - If overlap exists, report to the user and wait — do not proceed
Resolving Conflicts
If two features must touch the same file:
- Run them sequentially (feature A completes fully before feature B starts)
- Or split the shared file into separate modules first
Test File Ownership
Test files are owned exclusively by the test-writer. The builder must never modify them.
Python/Rust: If a test is wrong, the builder creates a bounce file
(<name>.bounce.md) in feature-docs/testing/ describing the defects, moves the
feature doc back to testing/, and stops. The coordinator re-invokes the test-writer
in fix mode. The builder never modifies test files or writes production code to
accommodate a defective test.
Frontend: E2E test files are created by the test-writer after the builder finishes. The builder has no test files to modify.
10. Style Work (Frontend Only)
Style refinement cannot be fully automated because "looks right" is subjective.
Style Doc Format
Style docs follow the same template as feature docs but live in styles/
instead of feature-docs/:
---
title: Dashboard Cards Redesign
status: ready
affected-files:
- src/components/dashboard/stat-card.tsx
- src/components/dashboard/chart-card.tsx
---
# Dashboard Cards Redesign
## Visual Direction
- Cards should use subtle shadows instead of borders
- Stat numbers should use the display font at 2xl
- Charts should fill the card width with 16px padding
## Reference
- See designs in figma: [link]
- Similar to the pattern in src/components/existing-card.tsx
Iteration Loop
- Human writes a style doc with visual direction
- Style agent applies changes and generates screenshots to
styles/reviews/<name>/iteration-N/ - Agent sets status to
awaiting-reviewand stops - Human reviews screenshots, writes feedback in the style doc
- Agent reads feedback, applies another iteration
- When human approves, screenshots become Playwright visual regression baselines
Approved screenshots are locked in as automated tests. Future agents cannot drift from the approved design without failing a visual regression test.
11. Hook Configuration
TaskCompleted
Blocks task completion until lifecycle compliance and the full verify pipeline pass.
{
"event": "TaskCompleted",
"command": "bash scripts/task-completed.sh"
}
The script runs two checks. First, it scans feature docs for status/directory
mismatches (e.g., a doc in ready/ with status: testing) and blocks if any are
found. Second, it runs scripts/verify.sh (full pipeline) and blocks (exit 2) on
any failure. Output is truncated to 30 lines to avoid context pollution. Verbose
logs are available in agent_logs/ for debugging.
Lifecycle-aware: For Python/Rust during the testing stage, only lifecycle
compliance is checked — the verify pipeline is skipped because tests are expected
to fail. For frontend, all stages run full verification (no stage has expected
failures in the build-first flow).
TeammateIdle
Logs pending work for the coordinator's awareness when a teammate goes idle.
{
"event": "TeammateIdle",
"command": "bash scripts/teammate-idle.sh"
}
The script first checks for stuck features (in building/ for over 30 minutes)
and warns if found. Then it scans feature-docs/ directories and logs any
pending work to stderr. Always exits 0 to let the agent session terminate
cleanly — the coordinator launches fresh sessions for the next role.
Stop (Fast Verify on Change)
Runs fast verification (type check only) after each Claude response when files have changed. Full verification is deferred to TaskCompleted to avoid spending agent time on the full suite during iterative development.
{
"event": "Stop",
"command": "bash scripts/stop-hook.sh"
}
The script checks git diff and git ls-files for modifications. If the working
tree is clean, it exits 0 (skips verify). If files have changed, it runs
scripts/fast-verify.sh (type check only) for quick feedback. If no fast-verify
script exists, it falls back to scripts/verify.sh. It reads stop_hook_active
from stdin to prevent recursive loops. Output is truncated to 20 lines.
Lifecycle-aware: For Python/Rust during the testing stage, verification is
skipped entirely because test-writer code references unimplemented APIs that will
always fail type checking. For frontend, verification runs at all stages.
Branch Protection
The guard-bash.sh PreToolUse hook blocks direct commits on main/master,
forcing agents to work on feature branches. This complements the branch-per-feature
strategy described in the coordination protocol.
12. Interaction Controls
tmux Mode
- Click into any teammate's pane to interact directly
- Each pane shows the teammate's full terminal session
- Standard tmux controls for pane management
in-process Mode
Shift+Down— cycle through active teammatesEnter— view a teammate's full sessionEscape— interrupt current turnCtrl+T— toggle task list view- Type to send messages to the currently visible teammate
13. Bootstrap Prompt (New Project)
Use this prompt to set up the agent teams workflow in a new project:
Set up the agent teams workflow for this project:
1. Create the feature-docs/ directory structure:
feature-docs/ideation/, feature-docs/ready/, feature-docs/testing/,
feature-docs/building/, feature-docs/review/, feature-docs/completed/
2. Create an agent_logs/ directory for verbose output
Add agent_logs/ to .gitignore
3. Verify that scripts/verify.sh and scripts/fast-verify.sh both exist:
- verify.sh: full pipeline (type check + lint + tests) with output to agent_logs/
- fast-verify.sh: type check only for quick feedback
4. Verify that .claude/settings.json includes TaskCompleted,
TeammateIdle, and Stop hooks
5. Create a sample feature doc in feature-docs/ready/ based on the
Feature Doc Format section in feature-docs/CLAUDE.md
6. Create an empty feature-docs/STATUS.md for the progress dashboard
7. Run the full verify pipeline once to confirm everything works
Report what you created and any issues found.
14. Retrofit Prompt (Existing Project)
Use this prompt to add the workflow to a project that already has code and tests:
Retrofit the agent teams workflow into this existing project:
1. Discovery — report the following:
- Package manager and framework
- Test runner and test directory structure
- Component library and state management
- Directory structure and naming conventions
- Existing .claude/ configuration
2. Create the feature-docs/ directory structure alongside existing code
3. Verify scripts/verify.sh works with the existing toolchain:
- Type checking command
- Lint command
- Test command
4. Check .claude/settings.json for existing hooks and add
TaskCompleted and TeammateIdle hooks without replacing
existing configuration
5. Identify migration needs:
- Test files not in a separate directory (need restructuring?)
- Missing test coverage for critical paths
- Files without clear ownership boundaries
Write a discovery report to agent_logs/discovery-report.md and
list any recommended changes (without acting on them).
15. Token Cost Expectations
Agent teams use roughly 5x the tokens of a single session per teammate. A team of 3 (test-writer, builder, reviewer) working on a single feature uses approximately 15x a normal session's tokens. This is justified when:
- The feature has clear, testable acceptance criteria
- Files can be cleanly owned by one feature at a time
- Quality gates (hooks) prevent wasted rework
- The alternative is sequential context degradation in a single long session
For simple features (one file, clear spec), use a single Claude Code session. Reserve agent teams for features touching multiple files across stores, components, services, and tests.
16. Limitations
- One team per session — a lead can only manage one team at a time
- No nested teams — teammates cannot spawn their own teams
- No session resumption for in-process teammates
- Higher token costs than single sessions (each teammate has its own context)
- Split panes require tmux or iTerm2 with
it2CLI - Shutdown can be slow — teammates finish their current turn before exiting
Anti-Patterns
| Anti-Pattern | Why It Fails | Fix |
|---|---|---|
| Builder modifies test files | Grading your own homework — tests lose independence as the oracle | Builder must never touch files created by test-writer |
| Builder works around defective tests | Production code is contorted to satisfy wrong assertions — e.g., returning error strings instead of raising exceptions because the test lacks pytest.raises |
Builder runs Test Quality Audit before implementation; if tests are defective, STOP and create a bounce file — never write code to accommodate a bad test |
| Builder writes code to satisfy weak assertions | A test asserts truthiness (is not None) instead of specific values; builder writes a minimal stub that returns a placeholder |
Builder's bright-line rule: if idiomatic code written without seeing the tests would not satisfy the assertion, the test is defective — bounce back |
| Skipping the test-writer step | No independent verification — builder's code is unchecked against the spec | Frontend: test-writer writes E2E tests after build. Python/Rust: test-writer writes failing tests before build |
| No file ownership declaration | Two agents edit the same file; merge conflicts and lost work | Feature docs must list affected-files; check for overlaps |
| Running parallel features on same branch | Merge conflicts, unclear ownership, broken bisect history | One branch per feature; merge to main sequentially |
| Passing full test output to agents | Context pollution fills the window with stack traces | Pass summary only: X passed, Y failed, first failure message |
| Feature doc without testable criteria | Test-writer cannot produce meaningful tests; builder has no target | Every acceptance criterion must use GIVEN/WHEN/THEN format |
| Skipping the reviewer step | Qualitative issues (conventions, duplication, design) go undetected | Reviewer validates what tests cannot catch |
| Using agent teams for trivial changes | 15x token cost for a one-line fix is wasteful | Single session for changes touching fewer than 3 files |
| Running full test suite on every save | Agent wastes time waiting for slow tests during iteration | Use fast-verify.sh (type check only) on Stop; full suite on TaskCompleted |
| Tests that check truthiness not values | Wrong implementation passes — toBeTruthy() accepts any non-null |
Assert specific return values, error types, and state changes |
| No progress dashboard | Agents start with zero context and waste time re-discovering state | Update feature-docs/STATUS.md after every stage transition |
| Ignoring stuck features | Agent spins for hours on a hard problem without human awareness | TeammateIdle warns after 30 minutes in building/; check agent_logs/ |
| Skipping feature doc lifecycle steps | Next agent never finds the feature doc; pipeline stalls indefinitely | task-completed.sh enforces status/directory sync; Completion Gate checklist in agent definitions |
| Coordinator edits implementation or test files | Violates role separation — coordinator and agent edit the same files, causing conflicts and undermining the test-as-oracle principle | Coordinator re-invokes the responsible agent with specific error details; never uses Write/Edit/sed on code |
| Coordinator fixes follow-up issues directly | Bypasses TDD — no failing test, no builder, no review; defeats the entire workflow even for "small" fixes | Route follow-ups through the full pipeline: test-writer → builder → reviewer; create a new feature doc or amend the existing one |
| Unbounded review → building loop | Builder and reviewer cycle indefinitely, burning tokens on issues the builder cannot resolve alone | Auto-loop up to 3 cycles; after 3, escalate to the user with remaining issues |
| Launching next agent before current one finishes | Both agents edit the same feature's files simultaneously, causing conflicts and lost work | Per-feature sequential: wait for each agent to complete before launching the next; cross-feature parallelism is fine with non-overlapping affected-files |
| Agent stays active after completing its stage | Idle agent reacts to next role's file changes, causing conflicts (e.g., builder "fixes" test-writer's new tests) | Exit Protocol in agent definitions: output report then STOP; TeammateIdle exits 0 to let agents die; coordinator launches fresh sessions |
| Reviewer fixes code directly | Defeats independence — reviewer can't objectively review code it wrote; bypasses TDD pipeline | Reviewer reports issues only; coordinator routes to test-writer (for test gaps) or builder (for implementation issues) |
| Ideation README never updated after pipeline | Feature appears incomplete in ideation folder; scanning for shipped features requires reading completed/ instead of ideation metadata |
Coordinator updates ideation README to shipped in "After reviewer approves" step |
| Feature docs without numeric prefix | Similarly-named features (user-auth.md vs user-auth-v2.md) cause agents to read the wrong doc from completed/ or other directories | Always use scripts/next-feature-number.sh to get a unique NNN- prefix at creation time |
| Running verify on test-writer output (Python/Rust) | Type errors on unresolved imports fire on every response; test failures block task completion | Hooks detect testing stage and stack via lifecycle-stage.sh; skip verification for Python/Rust TDD but not frontend build-first |
| Writing Vitest unit tests in frontend workflow | Unit tests break on every component refactor; internal APIs are unstable in vibe-coded UIs | Frontend test-writer writes Playwright E2E only; user-visible behavior is the stable contract |
| Picking up a feature with unmet dependencies | Implementation builds on code that doesn't exist yet; tests reference missing APIs; entire feature may need rework | Run scripts/check-deps.sh before pickup; agents and hooks check automatically |
| Deep dependency chains declared in a single doc | Stale chain data if intermediate features change; maintenance burden grows with chain length | Each doc declares only its immediate parent (depends-on: NNN-name); the script resolves the full chain dynamically from completed/ |
| Circular dependencies between features | Pipeline deadlock — neither feature can proceed because each waits for the other | check-deps.sh detects cycles and exits with error; redesign features to break the cycle |
| Spawning agents without TeamCreate | No team lifecycle, no SendMessage, no shared task tracking — agents run in isolation | Create a team first with TeamCreate, spawn agents with Agent tool, coordinate with SendMessage |
| Forgetting TeamDelete after pipeline | Orphaned team config persists in ~/.claude/teams/; stale task lists accumulate |
Always shutdown_request all teammates then TeamDelete after the pipeline completes |
| Starting next feature while on a feature branch | New feature branches from previous feature instead of main; creates dependency stacking where features can't be merged independently | Pre-flight check in implement-feature.md verifies git rev-parse --abbrev-ref HEAD returns main; refuse to start until on main |
| Skipping merge step after reviewer approval | Feature branch sits unmerged; next feature branches from stale state; causes cascading dependency chain across features | Coordinator creates PR with gh pr create, merges with gh pr merge --squash --delete-branch, then returns to main |