qa-run by ajaywadhara/agentic-sdlc-plugin

Arguments: $FEATURE (or "all" for entire suite)

Read CLAUDE.md before doing anything else. Ensure the dev server is running before proceeding.

━━━ BROWSER AGENT (Playwright MCP — runs first) ━━━

Before writing ANY test file, explore the live application using Playwright MCP. This is non-negotiable for web applications. You MUST see the real app first.

STEP A — NAVIGATE EVERY SCREEN: For each screen related to $FEATURE: 1. browser_navigate to the screen URL 2. browser_snapshot — capture the accessibility tree (This gives you the REAL selectors, roles, and accessible names. Never guess selectors. Always get them from the live app.) 3. browser_take_screenshot — save to qa/browser-tests/$FEATURE/ 4. Compare what you see against docs/SCREENS.md and wireframes/ 5. Log any discrepancies immediately

STEP B — TEST EVERY INTERACTION: On each screen: 1. browser_click every button — verify correct result 2. browser_type into every input — verify it accepts input 3. browser_select_option on every dropdown 4. browser_press_key Tab through the page — verify focus order 5. browser_press_key Enter on focused buttons — verify activation 6. For forms: submit with valid data, empty data, and invalid data

STEP C — TEST THE HAPPY PATH LIVE: Read P0 acceptance criteria from docs/PRD.md. Execute each Given/When/Then by actually doing it in the browser: - browser_navigate to start - browser_type / browser_click / browser_select_option to perform actions - browser_verify_text_visible / browser_verify_element_visible for assertions - browser_take_screenshot at each step

STEP D — RESPONSIVE CHECK: For the 3 most important screens: browser_resize width=1440 height=900 → browser_take_screenshot (desktop) browser_resize width=768 height=1024 → browser_take_screenshot (tablet) browser_resize width=375 height=812 → browser_take_screenshot (mobile) Verify: no overflow, no cut-off content, touch targets ≥ 44px on mobile

STEP E — HEALTH CHECK: browser_console_messages — flag any JavaScript errors or warnings browser_network_requests — flag any failed requests (4xx/5xx)

STEP F — GENERATE INITIAL TEST FILES: Use browser_generate_playwright_test to create .spec.ts files from your session. Save to: tests/e2e/$FEATURE-browser.spec.ts These become the foundation that the Engineer Agent refines below.

Output: qa/browser-tests/$FEATURE/exploration.md (Summary of what was found: working elements, broken elements, missing elements, selectors discovered, accessibility tree findings)

━━━ ANALYST AGENT ━━━

Read the source code for $FEATURE. Read qa/plans/ for any existing test coverage on this feature. Read qa/browser-tests/$FEATURE/exploration.md for live browser findings.

Map every testable surface:

Every user-facing interaction (clicks, inputs, form submissions, keyboard nav)
Every API call this feature makes and its possible response shapes
Every UI state: loading, empty, error, success, partial data
Every data-testid attribute or accessible role present in the DOM
Every validation rule (client-side and server-side)
Every route or navigation this feature triggers

Output: qa/plans/$FEATURE.md

━━━ PLANNER AGENT ━━━

Read qa/plans/$FEATURE.md. Assign priority and write a Given/When/Then for each:

P0 — "If this breaks, the product is unusable" (auth flows, data saving, core feature paths) P1 — "If this breaks, a significant feature is degraded" (secondary flows, important edge cases) P2 — "Edge case — good to have covered" (unusual inputs, rare states, nice-to-have validation)

Also include for each screen:

Empty state scenario (user has no data yet)
Error state scenario (network fails, server returns 500)
Mobile viewport scenario (at least for P0 items)

Output: qa/plans/$FEATURE-prioritized.md

━━━ ENGINEER AGENT ━━━

CRITICAL: The dev server must be running. Use Playwright MCP to navigate the actual, running application before writing any test.

For each scenario in qa/plans/$FEATURE-prioritized.md:

Navigate to the relevant route using Playwright MCP
Confirm the element you intend to target is visible and accessible
Note the exact accessible role, label, or testId
Then write the Playwright test

Write all tests to: tests/e2e/$FEATURE.spec.ts

Playwright rules — these are absolute, no exceptions: ALLOWED: getByRole('button', { name: 'Save' }) ALLOWED: getByLabel('Email address') ALLOWED: getByText('No transactions yet') ALLOWED: getByTestId('transaction-list') FORBIDDEN: page.$('.save-btn') FORBIDDEN: page.$('#submit') FORBIDDEN: page.$x('//button[@class="primary"]') FORBIDDEN: page.waitForTimeout(3000) <- use expect().toBeVisible() instead

Every test must:

Have a descriptive name explaining what it verifies
Assert a specific, meaningful outcome (not just "doesn't crash")
Use proper async/await throughout
Clean up any data it creates (use beforeEach/afterEach hooks)

━━━ SENTINEL AGENT ━━━

Read tests/e2e/$FEATURE.spec.ts line by line.

BLOCK (stop QA loop, return to Engineer) if any of these exist:

Any selector containing "." or "#" or "//"
Any action missing an await keyword
Any test block with zero assertions (expect() calls)
Any page.waitForTimeout() greater than 2000ms
Any test that only navigates and clicks with no assertion

WARN (flag but do not block) for:

Test names that don't clearly describe the scenario
Missing afterEach cleanup for data-creating tests
Tests that could affect each other's state

Output: qa/audits/$FEATURE-audit.md

If blockers found: list exact line numbers. Return to Engineer. If no blockers: proceed.

━━━ EXECUTION ━━━

Run: npx playwright test tests/e2e/$FEATURE.spec.ts --reporter=json Save full output to: qa/runs/$FEATURE-latest.json

━━━ HEALER AGENT (runs only if failures exist) ━━━

For each failed test:

Read the full error message and attached screenshot
Navigate to the failing page using Playwright MCP to inspect current state
Make a determination:

BROKEN TEST (the test is wrong): -> The page structure changed, selector no longer exists, or the expected text changed (not a regression, just drift) -> Fix: update the selector or assertion to match current reality -> Re-run the specific test -> If fixed: continue

CONFIRMED BUG (the application is wrong): -> The feature is not behaving as the PRD acceptance criteria describe -> Do NOT fix the test to hide the bug -> Create: qa/bugs/$FEATURE-[timestamp].md with: - Which test failed - What the expected behaviour is (from PRD) - What the actual behaviour is - Screenshot path - Steps to reproduce -> STOP the QA loop -> Report: "Bug confirmed in $FEATURE. QA loop stopped. Run /build $FEATURE with this bug report to fix."

Maximum 3 fix attempts per test before treating as confirmed bug.

━━━ EXPANDER AGENT (runs only if all tests pass) ━━━

Review qa/plans/$FEATURE-prioritized.md and tests/e2e/$FEATURE.spec.ts.

Find gaps — scenarios not yet covered. Look specifically for:

What happens when the user submits an empty form?
What happens at maximum input length (e.g. 10,000 character input)?
What happens if the user navigates away mid-flow and returns?
What happens if the user hits browser back/forward?
What happens on a very slow connection? (use Playwright network throttling)
What happens if the user is not authenticated and tries this feature?
What happens with special characters or emoji in text inputs?

Add 3-5 new tests to tests/e2e/$FEATURE.spec.ts. Append the new scenarios to qa/plans/$FEATURE-prioritized.md. Run the full suite again: npx playwright test tests/e2e/$FEATURE.spec.ts

━━━ SNAPSHOT AGENT ━━━

For every page involved in $FEATURE, capture screenshots at three viewports:

Desktop: 1440 x 900
Tablet: 768 x 1024
Mobile: 375 x 812

Save to: qa/visual-baselines/$FEATURE/[screen]-desktop.png qa/visual-baselines/$FEATURE/[screen]-tablet.png qa/visual-baselines/$FEATURE/[screen]-mobile.png

FIRST RUN BEHAVIOUR: These screenshots ARE the baseline. Save them. Document in qa/visual-baselines/$FEATURE/README.md:

Date baseline was created
What build/commit this represents
Any known intentional visual quirks

SUBSEQUENT RUN BEHAVIOUR: Run: npx playwright test --project=visual Compare each screenshot against baseline. If pixel difference > 2%: flag as visual regression. Save diff images to: qa/visual-reports/$FEATURE-[date]-diff.png A visual regression is treated the same as a test failure.

TO INTENTIONALLY UPDATE BASELINE: Run: npx playwright test --project=visual --update-snapshots Commit new baseline files. Document what changed and why in qa/visual-baselines/$FEATURE/README.md.

━━━ QUALITY GATE ━━━

Calculate score:

P0 tests: All must pass. Any P0 failure = score 0 = STOP HERE. P0 passing: 40 points P1 passing: [passing / total] x 30 points P2 passing: [passing / total] x 15 points Visual match: All snapshots match baseline = 15 points Any visual regression = 0 points for this category

TOTAL POSSIBLE: 100 points

Score < 85: FAIL -> Write full report to qa/QUALITY_LOG.md -> Output to user: which tests failed, which snapshots regressed, what the likely causes are -> "Run /build $FEATURE with this report to address failures."

Score >= 85: PASS -> Append to qa/QUALITY_LOG.md: date, feature, score, test count -> "QA passed for $FEATURE. Score: [X]/100. Proceed to CI."