playwright-debugger
Playwright Failed Test Debugger
Diagnose Playwright test failures from report files. Classifies root causes and provides concrete fixes.
Prerequisites: Get the Report
Determine the report source in this order:
1. playwright-report/ already exists locally → skip to Phase 1.
2. No report available → run tests locally and write output to a file (do NOT read stdout directly — output may be truncated):
npx playwright test --reporter=json 2>/dev/null > playwright-report/results.json
Phase 1: Extract Failures
Locate results.json under playwright-report/, then extract all tests where status is "failed" or "timedOut". For each failed test, collect: title, status, error.message, location.file, duration.
Use jq if available:
cat playwright-report/results.json | jq '[
.. | objects |
select(.status == "failed" or .status == "timedOut") |
{title: .title, status: .status, error: .error.message, file: .location.file, duration: .duration}
] | unique'
If jq is unavailable, read the JSON file directly with the Read tool and extract failed tests manually.
Phase 2: Classify Root Cause
Use Phase 1 output (error message + duration + file) to classify each failure. Most failures are identifiable here — only go to Phase 3 if still unclear.
| # | Category | Signals | Review Pattern |
|---|---|---|---|
| F1 | Flaky / Timing | TimeoutError, duration near maxTimeout, passes on retry |
#9a |
| F2 | Selector Broken | locator not found, strict mode violation, element count mismatch |
#6, #10 |
| F3 | Network Dependency | net::ERR_*, unexpected API response, 404/500 |
— |
| F4 | Assertion Mismatch | Expected X to equal Y, over-broad check |
#4 |
| F5 | Missing Then | Action completed but wrong state remains | #2 |
| F6 | Condition Branch Missing | Element conditionally present, assertion always runs | #5 |
| F7 | Test Isolation Failure | Passes alone, fails in suite; leaked state | — |
| F8 | Environment Mismatch | CI vs local only; viewport, OS, timezone | — |
| F9 | Data Dependency | Missing seed data, hardcoded IDs | — |
| F10 | Auth / Session | Session expired, role-based UI not rendered | — |
| F11 | Async Order Assumption | Promise.all order, parallel race |
— |
| F12 | POM / Locator Drift | DOM changed, POM locator not updated | #10 |
| F13 | Error Swallowing | .catch(() => {}) hiding failure, test passes silently |
#3 |
| F14 | Animation Race | Element visible but content not yet rendered | #9a |
Classification steps:
- Match error message to signals above
durationnear timeout → F1 or F3- CI-only failure → F7 or F8
- Passes on retry → F1
Phase 3: Trace Analysis (only if Phase 2 is unclear)
trace.zip contains three parts:
trace.trace— newline-delimited JSON events (actions, snapshots, console logs)trace.network— newline-delimited JSON (network requests and responses)resources/— JPEG screenshots per step
Find trace files: find playwright-report -name "*.zip" | head -10
Extract and read each file using unzip -p <trace.zip> <entry>, then parse the newline-delimited JSON. Stop as soon as the root cause is clear.
What to look for at each step:
-
Which step failed — filter
trace.traceevents wheretype === "after"anderroris present. LogapiNameanderror.message. -
All actions with pass/fail — filter
trace.tracefortype === "after", log index,apiName, and whethererroris set. -
DOM at the failed step (selector issues) — filter
trace.tracefortype === "frame-snapshot"matching thebeforeSnapshotname from step 2. Inspectsnapshot.html. -
Failed network requests — filter
trace.networkfortype === "resource-snapshot"whereresponse.status >= 400. Log status and URL. -
JS console errors — filter
trace.tracefortype === "console"andmessageType === "error". Logtext. -
Still unclear — add temporary
page.screenshot()calls before and after the failing action, re-run, then inspect the screenshots with a browser agent. Remove screenshots after debugging.
Phase 4: Fix Suggestions
For each failure, produce a finding in this format:
[P0/P1/P2] test name — Category
- Category: e.g. F2 — Selector Broken (#10 POM Drift)
- Error: the raw error message
- Root Cause: one sentence explanation
- Fix: before/after code showing the concrete change
Severity:
- P0: Test passes silently when feature is broken (F6, F13)
- P1: Intermittent or misleading failures (F1, F2, F3, F7, F11, F14)
- P2: Consistent failures, straightforward fix (F4, F5, F8, F9, F10, F12)
Output Format
Failure Summary
- Total: N failed (M flaky, K broken, J environment)
[P0] `test name` — F13 Error Swallowing
...
Review Summary
| Sev | Count | Top Category | Files |
|-----|-------|------------------|------------------|
| P0 | 1 | Error Swallowing | auth.spec.ts |
| P1 | 3 | Flaky / Timing | dashboard.spec.ts|
| P2 | 2 | POM Drift | settings.spec.ts |
Fix P0 first. Run npx playwright test --retries=2 to confirm flaky tests.
More from dididy/e2e-skills
e2e-reviewer
Use when reviewing, auditing, or improving E2E test specs for Playwright or Cypress — static code analysis of existing test files, not diagnosing runtime failures. Triggers on "review my tests", "audit test quality", "find weak tests", "my tests always pass but miss bugs", "tests pass CI but miss regressions", "improve playwright tests", "improve cypress tests", "check test coverage gaps", "my tests are fragile", "tests break on every UI change", "test suite is hard to maintain", "we have coverage but bugs still slip through", "flaky tests", "test anti-patterns", "check my e2e tests", "tests pass locally but fail in CI". Detects 13 anti-patterns -- name-assertion mismatch, missing Then, error swallowing (.catch in POM via grep; try/catch in specs via LLM; Cypress uncaught:exception suppression), always-passing assertions (one-shot booleans, Locator-as-truthy, toBeAttached, timeout:0, one-shot URL), bypass patterns (conditional assertions + force:true), raw DOM queries, focused test leak (test.only committed), missing assertions (dangling locators + boolean result discarded), hard-coded sleeps (P1), flaky test patterns (positional selectors + serial ordering), YAGNI + zombie specs (unused POM members, single-use Util wrappers, zombie spec files), expect.soft() overuse. Also runs supplementary grep checks for general code quality issues (missing auth setup, inconsistent POM usage, hardcoded credentials, missing await, deprecated page API, networkidle).
32cypress-debugger
Use when Cypress tests have actually failed and you need to diagnose runtime failures — from mochawesome or JUnit report files, local or CI. Triggers on "debug cypress tests", "why did cypress tests fail", "cypress CI failure", "flaky cypress test failures", "cypress timed out retrying", "cypress tests pass locally but fail in CI", "analyze cypress/reports". Classifies runtime failures into root causes (not static code analysis) and suggests concrete fixes.
31playwright-test-generator
Use when generating new Playwright E2E tests from scratch. Triggers on "generate playwright tests", "write e2e tests for X", "add playwright coverage for X", "create test for X page", "generate tests for the login page". Autonomous mode starts from coverage gap analysis when no target is specified; argument mode targets a specific page or feature directly. Explores the live app via Playwright CLI or agent-browser, designs scenarios with user approval via Plan Mode, auto-detects project structure (POM vs flat spec), runs YAGNI audit and e2e-reviewer after generation, and hands off to playwright-debugger after 3 failed fix attempts.
18