nightshift
Nightshift
Autonomous sequential development loop for AFK sessions. Works through your specs and bugs backlog one task at a time, fully completing each before moving on. Prioritizes acceptance testing — every feature must be verified from the user's perspective before it ships.
When to use Nightshift vs Swarm
| Nightshift | Swarm | |
|---|---|---|
| Goal | Ship spec'd features, fix bugs | Harden codebase quality |
| Agents | 1 sequential, no worktrees | 4 parallel in worktrees |
| Testing focus | Acceptance-first (E2E) | Coverage + mutation + CRAP |
| Review | Persona-based critique | Metric-driven quality agents |
| Best for | "Build these features while I'm away" | "Make the codebase healthier" |
Use Nightshift when you have specs ready and want features shipped. Use Swarm when the codebase needs quality hardening across the board. They compose well: run Nightshift to ship features, then Swarm to harden.
Prerequisites
Project documentation structure
Nightshift follows your existing docs/ convention. It discovers specs using the project's CLAUDE.md knowledge base table, then looks in standard locations:
docs/ ← or .plans/ (both supported)
product-specs/
user-login.md ← agent will pick this up
push-notifications.md ← agent will pick this up
draft-checkout.md ← agent ignores draft-* files
acceptance/
index.md ← TOC of .feature files
user-login.feature ← Gherkin acceptance criteria
exec-plans/ ← active execution plans
adrs/ ← architectural decision records
lessons.md ← cross-run learnings
BUGS.md ← optional bug backlog
Spec discovery order:
- Read CLAUDE.md knowledge base table for doc locations
- Look in
docs/product-specs/(ordocs/specs/,.plans/) - Look for standalone
SPEC.mdorspec/SPEC.md - Fall back to any
*.mdindocs/that looks like a spec
Name files draft-* while you're still writing them. The agent only works
on non-draft specs. Remove the draft- prefix when the spec is ready.
Bugs (optional)
Bugs are always worked before feature specs. The agent checks for bugs in this priority order:
docs/BUGS.md— a checklist of known bugs- Linear issues — if the project uses Linear (detected from CLAUDE.md)
specs/BUGS.mdor.plans/BUGS.md— alternative locations
Format for BUGS.md:
# Bugs
- [ ] Login button unresponsive on iPad landscape — see screenshot in docs/assets/
- [ ] Push notification sound plays twice on iOS 18
- [x] Already fixed bug (agent skips these)
Gherkin acceptance specs (optional but recommended)
If the project has docs/acceptance/*.feature files, the agent uses them
as the source of truth for acceptance test behavior. These Gherkin specs
pair with the E2E tests (Playwright/XCUITest/Detox) that nightshift writes.
When a spec references acceptance criteria, the agent checks for a matching
.feature file and uses it to drive test writing.
Acceptance test infrastructure
Nightshift requires a working E2E test setup. The agent writes acceptance tests FIRST (before implementation), runs them to confirm red, implements, then confirms green. Without a working test harness, the loop will block.
| Platform | Framework | Detection |
|---|---|---|
| Web | Playwright | test:e2e or e2e in package.json, or playwright.config.* |
| iOS (Swift) | XCUITest | *.xcodeproj or *.xcworkspace with UI test targets |
| React Native | Detox | detox in package.json deps, .detoxrc.js or detox.config.js |
The agent auto-detects which framework to use based on the project.
Workflow
Phase 1: PREP (interactive, ~2 minutes)
├── Discover spec locations from CLAUDE.md + docs/ structure
├── Detect platform + test framework
├── Confirm feedback commands (test, typecheck, lint, e2e)
├── Optional: user provides duration estimate
└── User confirms → agent takes over
Phase 2: LOOP (AFK, autonomous)
├── 0. Clean working tree (stash or commit uncommitted work)
├── 1. Run full test suite — fix any failures before starting new work
├── 2. Pick next task: BUGS.md first, then oldest non-draft spec from docs/
├── 3. Load spec + relevant project docs
├── 4. Write acceptance tests FIRST (Playwright/XCUITest/Detox)
├── 5. Run acceptance tests → confirm RED
├── 6. Write unit/integration tests for key logic
├── 7. Implement (TDD red-green-refactor for units, then acceptance green)
├── 8. Run all feedback commands (test, typecheck, lint)
├── 9. Run review personas (sub-agents critique the diff)
├── 10. Fix issues from review, re-run feedback commands
├── 11. Run full test suite (regression check)
├── 12. Commit with detailed message for human review
├── 13. Capture unrelated observations → NOTICED.md
├── 14. Loop to step 2 for next task
└── 15. Write morning briefing when done or time's up
Phase 3: HANDOFF (waiting for human)
└── Morning briefing in .nightshift/MORNING.md
Duration Awareness
If the user provides a duration ("I'll be gone 2 hours", "overnight", "back Monday"), the agent uses it to:
- Scope work: Don't start a large spec if there's only 30 minutes left
- Reserve wrap-up time: Stop picking new tasks ~15 minutes before the estimated return to write the morning briefing and ensure clean state
- Prioritize: With limited time, pick the highest-impact task
Duration is optional. Without it, the agent works until the backlog is empty.
Directory Structure
.nightshift/
MORNING.md ← Morning briefing for human review
NOTICED.md ← Unrelated issues the agent observed
CHANGELOG.md ← Cumulative changelog entries
lessons.md ← Persists across runs (agent-written)
runs/
2026-03-15T22-00/
progress.md ← Task-by-task log for this run
logs/
nightshift.log ← Full agent output
Acceptance Testing Strategy
The agent writes acceptance tests BEFORE implementation. These tests exercise the feature from the user's perspective — not unit-level mocks.
Web (Playwright)
// specs says: "user can reset password via email link"
test('password reset flow', async ({ page }) => {
await page.goto('/login');
await page.click('text=Forgot password?');
await page.fill('[name=email]', 'user@example.com');
await page.click('text=Send reset link');
await expect(page.locator('.success-message')).toContainText('Check your email');
});
iOS (XCUITest)
// specs says: "user can add item to cart from product detail"
func testAddToCartFromProductDetail() throws {
let app = XCUIApplication()
app.launch()
app.cells["product-hiking-boots"].tap()
app.buttons["Add to Cart"].tap()
XCTAssertTrue(app.badges["cart-badge"].exists)
XCTAssertEqual(app.badges["cart-badge"].label, "1")
}
React Native (Detox)
// specs says: "child can tap Play and hear the character speak"
it('should start a voice session when Play is tapped', async () => {
await element(by.id('play-button')).tap();
await expect(element(by.id('session-screen'))).toBeVisible();
await expect(element(by.id('character-avatar'))).toBeVisible();
await waitFor(element(by.id('debug-state-overlay')))
.toHaveText(expect.stringContaining('state:greeting'))
.withTimeout(5000);
});
The agent reads the spec to understand what the user should experience, then writes tests that verify exactly that. Implementation follows the tests.
Review Personas
After implementation, the agent spawns sub-agents as reviewers. Each persona critiques the diff from their own perspective and owns specific documentation.
| Persona | Focus | Owns |
|---|---|---|
| User Advocate | "Does this actually work from a user's perspective?" | Specs, acceptance tests |
| Architect | "Does this fit the system? Any coupling concerns?" | Architecture docs, AGENTS.md |
| Domain Expert | "Is the domain logic correct? Edge cases?" | Domain-specific docs |
| Code Quality | "Is this clean, simple, well-tested?" | CLAUDE.md conventions |
| Platform Expert | "Any platform-specific gotchas?" | Platform docs, gotchas |
Each reviewer returns: APPROVE, REQUEST_CHANGES (with specifics), or FLAG (noticed something worth mentioning but not blocking). The agent iterates until all reviewers approve.
See references/review-personas.md for full persona prompts.
Morning Briefing
.nightshift/MORNING.md is written for YOU, the human. It's designed to be
read in 2 minutes over coffee:
# Morning Briefing — 2026-03-15
## Summary
Completed 3 tasks. 1 bug fix, 2 features. All tests green.
## What was done
1. **BUG: Push notification plays twice** — Fixed race condition in
notification handler. Added Detox test. Commit: a1b2c3d
2. **SPEC: User login flow** — Email/password login with validation.
Playwright tests cover happy path + error states. Commit: e4f5g6h
3. **SPEC: Password reset** — Email-based reset flow. Playwright tests
cover full flow. Commit: i7j8k9l
## What needs your attention
- Password reset email template is placeholder — needs real copy
- Login error messages may need UX review (currently generic)
## What I noticed (unrelated)
- The checkout page has a broken image on mobile (see NOTICED.md)
- `utils/format.ts` has a function with CRAP score 45
## Test results
- Unit: 142 passing
- E2E: 28 passing (3 new)
- Typecheck: clean
- Lint: clean
Launching
Interactive setup (recommended)
Tell Claude Code: "Let's set up nightshift" or "/nightshift"
The skill will:
- Read CLAUDE.md to discover docs structure and spec locations
- Detect platform and test framework
- Confirm feedback commands
- Ask for duration estimate (optional)
- Launch the loop
Direct script execution
chmod +x ~/src/skills/nightshift/scripts/nightshift.sh
env -u CLAUDECODE ~/src/skills/nightshift/scripts/nightshift.sh \
--project /path/to/your-repo \
[--duration "4 hours"] \
[--iterations 15] \
[--agent claude]
References
| Topic | Source | Load When |
|---|---|---|
| Review personas | references/review-personas.md | Step 9 — reviewing implementation |
| Acceptance patterns | references/acceptance-testing.md | Step 4 — writing acceptance tests |
| Morning briefing template | references/morning-briefing.md | Step 15 — writing handoff |