qa-run
Arguments: $FEATURE (or "all" for entire suite)
Read CLAUDE.md before doing anything else. Ensure the dev server is running before proceeding.
━━━ BROWSER AGENT (Playwright MCP — runs first) ━━━
Before writing ANY test file, explore the live application using Playwright MCP. This is non-negotiable for web applications. You MUST see the real app first.
STEP A — NAVIGATE EVERY SCREEN: For each screen related to $FEATURE: 1. browser_navigate to the screen URL 2. browser_snapshot — capture the accessibility tree (This gives you the REAL selectors, roles, and accessible names. Never guess selectors. Always get them from the live app.) 3. browser_take_screenshot — save to qa/browser-tests/$FEATURE/ 4. Compare what you see against docs/SCREENS.md and wireframes/ 5. Log any discrepancies immediately
STEP B — TEST EVERY INTERACTION: On each screen: 1. browser_click every button — verify correct result 2. browser_type into every input — verify it accepts input 3. browser_select_option on every dropdown 4. browser_press_key Tab through the page — verify focus order 5. browser_press_key Enter on focused buttons — verify activation 6. For forms: submit with valid data, empty data, and invalid data
STEP C — TEST THE HAPPY PATH LIVE: Read P0 acceptance criteria from docs/PRD.md. Execute each Given/When/Then by actually doing it in the browser: - browser_navigate to start - browser_type / browser_click / browser_select_option to perform actions - browser_verify_text_visible / browser_verify_element_visible for assertions - browser_take_screenshot at each step
STEP D — RESPONSIVE CHECK: For the 3 most important screens: browser_resize width=1440 height=900 → browser_take_screenshot (desktop) browser_resize width=768 height=1024 → browser_take_screenshot (tablet) browser_resize width=375 height=812 → browser_take_screenshot (mobile) Verify: no overflow, no cut-off content, touch targets ≥ 44px on mobile
STEP E — HEALTH CHECK: browser_console_messages — flag any JavaScript errors or warnings browser_network_requests — flag any failed requests (4xx/5xx)
STEP F — GENERATE INITIAL TEST FILES: Use browser_generate_playwright_test to create .spec.ts files from your session. Save to: tests/e2e/$FEATURE-browser.spec.ts These become the foundation that the Engineer Agent refines below.
Output: qa/browser-tests/$FEATURE/exploration.md (Summary of what was found: working elements, broken elements, missing elements, selectors discovered, accessibility tree findings)
━━━ ANALYST AGENT ━━━
Read the source code for $FEATURE. Read qa/plans/ for any existing test coverage on this feature. Read qa/browser-tests/$FEATURE/exploration.md for live browser findings.
Map every testable surface:
- Every user-facing interaction (clicks, inputs, form submissions, keyboard nav)
- Every API call this feature makes and its possible response shapes
- Every UI state: loading, empty, error, success, partial data
- Every data-testid attribute or accessible role present in the DOM
- Every validation rule (client-side and server-side)
- Every route or navigation this feature triggers
Output: qa/plans/$FEATURE.md
━━━ PLANNER AGENT ━━━
Read qa/plans/$FEATURE.md. Assign priority and write a Given/When/Then for each:
P0 — "If this breaks, the product is unusable" (auth flows, data saving, core feature paths) P1 — "If this breaks, a significant feature is degraded" (secondary flows, important edge cases) P2 — "Edge case — good to have covered" (unusual inputs, rare states, nice-to-have validation)
Also include for each screen:
- Empty state scenario (user has no data yet)
- Error state scenario (network fails, server returns 500)
- Mobile viewport scenario (at least for P0 items)
Output: qa/plans/$FEATURE-prioritized.md
━━━ ENGINEER AGENT ━━━
CRITICAL: The dev server must be running. Use Playwright MCP to navigate the actual, running application before writing any test.
For each scenario in qa/plans/$FEATURE-prioritized.md:
- Navigate to the relevant route using Playwright MCP
- Confirm the element you intend to target is visible and accessible
- Note the exact accessible role, label, or testId
- Then write the Playwright test
Write all tests to: tests/e2e/$FEATURE.spec.ts
Playwright rules — these are absolute, no exceptions: ALLOWED: getByRole('button', { name: 'Save' }) ALLOWED: getByLabel('Email address') ALLOWED: getByText('No transactions yet') ALLOWED: getByTestId('transaction-list') FORBIDDEN: page.$('.save-btn') FORBIDDEN: page.$('#submit') FORBIDDEN: page.$x('//button[@class="primary"]') FORBIDDEN: page.waitForTimeout(3000) <- use expect().toBeVisible() instead
Every test must:
- Have a descriptive name explaining what it verifies
- Assert a specific, meaningful outcome (not just "doesn't crash")
- Use proper async/await throughout
- Clean up any data it creates (use beforeEach/afterEach hooks)
━━━ SENTINEL AGENT ━━━
Read tests/e2e/$FEATURE.spec.ts line by line.
BLOCK (stop QA loop, return to Engineer) if any of these exist:
- Any selector containing "." or "#" or "//"
- Any action missing an await keyword
- Any test block with zero assertions (expect() calls)
- Any page.waitForTimeout() greater than 2000ms
- Any test that only navigates and clicks with no assertion
WARN (flag but do not block) for:
- Test names that don't clearly describe the scenario
- Missing afterEach cleanup for data-creating tests
- Tests that could affect each other's state
Output: qa/audits/$FEATURE-audit.md
If blockers found: list exact line numbers. Return to Engineer. If no blockers: proceed.
━━━ EXECUTION ━━━
Run: npx playwright test tests/e2e/$FEATURE.spec.ts --reporter=json Save full output to: qa/runs/$FEATURE-latest.json
━━━ HEALER AGENT (runs only if failures exist) ━━━
For each failed test:
- Read the full error message and attached screenshot
- Navigate to the failing page using Playwright MCP to inspect current state
- Make a determination:
BROKEN TEST (the test is wrong): -> The page structure changed, selector no longer exists, or the expected text changed (not a regression, just drift) -> Fix: update the selector or assertion to match current reality -> Re-run the specific test -> If fixed: continue
CONFIRMED BUG (the application is wrong): -> The feature is not behaving as the PRD acceptance criteria describe -> Do NOT fix the test to hide the bug -> Create: qa/bugs/$FEATURE-[timestamp].md with: - Which test failed - What the expected behaviour is (from PRD) - What the actual behaviour is - Screenshot path - Steps to reproduce -> STOP the QA loop -> Report: "Bug confirmed in $FEATURE. QA loop stopped. Run /build $FEATURE with this bug report to fix."
Maximum 3 fix attempts per test before treating as confirmed bug.
━━━ EXPANDER AGENT (runs only if all tests pass) ━━━
Review qa/plans/$FEATURE-prioritized.md and tests/e2e/$FEATURE.spec.ts.
Find gaps — scenarios not yet covered. Look specifically for:
- What happens when the user submits an empty form?
- What happens at maximum input length (e.g. 10,000 character input)?
- What happens if the user navigates away mid-flow and returns?
- What happens if the user hits browser back/forward?
- What happens on a very slow connection? (use Playwright network throttling)
- What happens if the user is not authenticated and tries this feature?
- What happens with special characters or emoji in text inputs?
Add 3-5 new tests to tests/e2e/$FEATURE.spec.ts. Append the new scenarios to qa/plans/$FEATURE-prioritized.md. Run the full suite again: npx playwright test tests/e2e/$FEATURE.spec.ts
━━━ SNAPSHOT AGENT ━━━
For every page involved in $FEATURE, capture screenshots at three viewports:
- Desktop: 1440 x 900
- Tablet: 768 x 1024
- Mobile: 375 x 812
Save to: qa/visual-baselines/$FEATURE/[screen]-desktop.png qa/visual-baselines/$FEATURE/[screen]-tablet.png qa/visual-baselines/$FEATURE/[screen]-mobile.png
FIRST RUN BEHAVIOUR: These screenshots ARE the baseline. Save them. Document in qa/visual-baselines/$FEATURE/README.md:
- Date baseline was created
- What build/commit this represents
- Any known intentional visual quirks
SUBSEQUENT RUN BEHAVIOUR: Run: npx playwright test --project=visual Compare each screenshot against baseline. If pixel difference > 2%: flag as visual regression. Save diff images to: qa/visual-reports/$FEATURE-[date]-diff.png A visual regression is treated the same as a test failure.
TO INTENTIONALLY UPDATE BASELINE: Run: npx playwright test --project=visual --update-snapshots Commit new baseline files. Document what changed and why in qa/visual-baselines/$FEATURE/README.md.
━━━ QUALITY GATE ━━━
Calculate score:
P0 tests: All must pass. Any P0 failure = score 0 = STOP HERE. P0 passing: 40 points P1 passing: [passing / total] x 30 points P2 passing: [passing / total] x 15 points Visual match: All snapshots match baseline = 15 points Any visual regression = 0 points for this category
TOTAL POSSIBLE: 100 points
Score < 85: FAIL -> Write full report to qa/QUALITY_LOG.md -> Output to user: which tests failed, which snapshots regressed, what the likely causes are -> "Run /build $FEATURE with this report to address failures."
Score >= 85: PASS -> Append to qa/QUALITY_LOG.md: date, feature, score, test count -> "QA passed for $FEATURE. Score: [X]/100. Proceed to CI."