e2e-test-reviewer
E2E Test Scenario Quality Review
Systematic checklist for reviewing E2E spec files AND Page Object Model (POM) files. Framework-agnostic principles; code examples show Playwright, Cypress, and Puppeteer where they differ.
Phase 1: Automated Grep Checks (Run First)
Once the review target files are determined, run the following Bash commands before LLM analysis to mechanically detect known anti-patterns.
echo "=== E2E Mechanical Anti-Pattern Check ==="
echo ""
# 3. Error Swallowing — .catch(() => {}) / .catch(() => false)
echo "--- #3 Error Swallowing ---"
grep -rn '\.catch(\s*() =>' e2e/ --include='*.ts' --include='*.js' --include='*.cy.*' | grep -v node_modules | grep -v '// justified'
# 4. Always-Passing — assertion that can never fail
echo "--- #4 Always-Passing ---"
grep -rn -E 'toBeGreaterThanOrEqual\(0\)|should\(.*(gte|greaterThan).*0\)' e2e/ --include='*.ts' --include='*.js' --include='*.cy.*'
# 5. Boolean Trap — toBeTruthy on non-boolean values (Locator, ElementHandle, selector result)
# Excludes cases where the value is already boolean (e.g., response.ok(), isVisible(), isChecked())
echo "--- #5 Boolean Trap ---"
grep -rn -E 'expect\(.*\)\.toBeTruthy\(\)|should\(.*(be\.truthy|be\.true)\)' e2e/ --include='*.spec.*' --include='*.test.*' --include='*.cy.*' | grep -v -E '\.(ok|isVisible|isChecked|isDisabled|isEnabled|isEditable|isHidden)\(\)'
# 6. Conditional Bypass — expect inside if(isVisible)
echo "--- #6 Conditional Bypass ---"
grep -rn -E "if.*(isVisible|is\(.*:visible.*\))" e2e/ --include='*.spec.*' --include='*.test.*' --include='*.cy.*'
# 7. Raw DOM — document.querySelector in spec/test files
echo "--- #7 Raw DOM in specs ---"
grep -rn 'document\.querySelector' e2e/ --include='*.spec.*' --include='*.test.*' --include='*.cy.*'
# 12. Hard-coded Timeout — waitForTimeout / cy.wait(number)
echo "--- #12 Hard-coded Timeout ---"
grep -rn -E 'waitForTimeout|cy\.wait\(\d' e2e/ --include='*.ts' --include='*.js' --include='*.cy.*'
# 13b. Network dependency — goto/visit without route/intercept setup nearby
echo "--- #13b Missing Network Mock ---"
grep -rn -E 'page\.goto|cy\.visit' e2e/ --include='*.spec.*' --include='*.test.*' --include='*.cy.*' | grep -v 'route\.\|intercept\|mock' | head -20
echo ""
echo "=== Done ==="
Interpreting results:
- Zero hits → no mechanical issues found, proceed to Phase 2
- Any hit → report each line as an issue (includes file:line)
- Lines with
// justifiedcomments are excluded (intentional usage)
Output Phase 1 results as-is. The LLM must not reinterpret them.
Phase 2: LLM Review (Subjective Checks Only)
Patterns already detected in Phase 1 (#3, #4, #5 partial, #6, #7, #12, #13b partial) are skipped. The LLM performs only these checks:
| # | Check | Reason |
|---|---|---|
| 1 | Name-Assertion Alignment | Requires semantic interpretation |
| 2 | Missing Then | Requires logic flow analysis |
| 8 | Render-Only | Requires test value judgment |
| 9 | Duplicate Scenarios | Requires similarity comparison |
| 10 | Misleading Names | Requires semantic interpretation |
| 11 | Over-Broad Assertions + Subject-Inversion | Requires domain context |
| 13 | Flaky Patterns (partial) | Requires context judgment for nth(), animation, network patterns |
| 14 | YAGNI in POM | Requires usage grep then judgment |
Phase 3: Coverage Gap Analysis (After Review)
After completing Phase 1 + 2, identify scenarios the test suite does NOT cover. Scan the page/feature under test and flag missing:
| Gap Type | What to look for |
|---|---|
| Error paths | Form validation errors, API failure states, network offline, 404/500 pages |
| Edge cases | Empty state, max-length input, special characters, concurrent actions |
| Accessibility | Keyboard navigation, screen reader labels, focus management after modal/dialog |
| Auth boundaries | Unauthorized access redirects, expired session handling, role-based visibility |
Output: List up to 5 highest-value missing scenarios as suggestions, not requirements. Format:
## Coverage Gaps (Suggestions)
1. **[Error path]** No test for form submission with server error — add API mock returning 500
2. **[Edge case]** No test for empty list state — verify empty state message shown
Review Checklist
Run each check against every non-skipped test and every changed POM file.
Important: test.skip() with a reason comment or reason string is intentional — do NOT flag or remove these. Only flag mid-test conditional skips that hide failures (see #6).
Tier 1 — P0/P1 (always check)
1. Name-Assertion Alignment [LLM-only]
Symptom: Test name promises something the assertions don't verify.
// BAD — name says "status" but only checks visibility
test('should display paragraph status', () => {
await expect(status).toBeVisible(); // no status content check
});
Rule: Every noun in the test name must have a corresponding assertion. Add it or rename.
Procedure:
- Extract all nouns from the test name (e.g., "should display paragraph status")
- For each noun, search the test body for
expect()that verifies it - Missing noun → add assertion or remove noun from name
Common patterns: "should display X" with only toBeVisible() (no content check), "should update X and Y" with assertion for X but not Y, "should validate form" with only happy-path assertion.
2. Missing Then [LLM-only]
Symptom: Test acts but doesn't verify the final expected state.
// BAD — toggles but doesn't verify the dismissed state
test('should cancel edit on Escape', () => {
await input.click();
await page.keyboard.press('Escape');
await expect(text).toBeVisible();
// input still hidden?
});
Rule: For toggle/cancel/close actions, verify both the restored state AND the dismissed state.
Procedure:
- Identify the action verb (toggle, cancel, close, delete, submit, undo)
- List the expected state changes (element appears/disappears, text changes, count changes)
- Check that BOTH sides of the state change are asserted
Common patterns: Cancel/Escape without verifying input is hidden, delete without verifying count decreased, submit without verifying form resets, tab switch without verifying previous tab content is hidden.
3. Error Swallowing [grep-detectable]
Symptom (spec): try/catch wrapping assertions — test passes on error.
Symptom (POM): .catch(() => {}) or .catch(() => false) on awaited operations — caller never sees the failure.
// BAD spec — silent pass
try { await expect(header).toBeVisible(); }
catch { console.log('skipped'); }
// BAD POM — caller thinks execution succeeded
await runningIndicator.waitFor({ state: 'detached' }).catch(() => {});
Rule (spec): Never wrap assertions in try/catch. Use test.skip() in beforeEach if the test can't run.
Rule (POM): Remove .catch(() => {}) / .catch(() => false) from wait/assertion methods. If the operation can legitimately fail, the caller should decide how to handle it. Only keep catch for UI stabilization like editor.click({ force: true }).catch(() => textArea.focus()).
4. Always-Passing Assertions [grep-detectable]
Symptom: Assertion that can never fail.
// BAD — count >= 0 is always true
expect(count).toBeGreaterThanOrEqual(0);
Rule: Search for toBeGreaterThanOrEqual(0), toBeTruthy() on always-truthy strings, || chains that accept defaults as valid.
5. Boolean Trap Assertions [grep-detectable]
Symptom (spec): expect(locator).toBeTruthy() on a Locator/ElementHandle object — always passes because objects are always truthy regardless of whether the element exists in the DOM.
NOT a boolean trap: expect(response.ok()).toBeTruthy() or expect(await el.isVisible()).toBe(true) — these operate on actual boolean return values. While toBe(true) is slightly more precise than toBeTruthy() for booleans, this is a style preference, not a bug. Only flag as P1 when the value is a non-boolean object (Locator, ElementHandle, Promise).
Symptom (POM): Method returns Promise<boolean> instead of exposing an element handle — forces spec into boolean trap.
// BAD — boolean return forces spec into trap
async isEditorVisible(index = 0): Promise<boolean> {
return await paragraph.locator('code-editor').isVisible();
}
expect(await page.isEditorVisible(0)).toBe(true);
Rule (spec): Use the framework's built-in assertion instead of extracting a boolean first:
- Playwright:
await expect(locator).toBeVisible() - Cypress:
cy.get(selector).should('be.visible') - Puppeteer:
await page.waitForSelector(selector, { visible: true })
Rule (POM): Expose the element handle (Locator / selector string) instead of returning Promise<boolean>. Let specs use framework assertions directly.
6. Conditional Bypass (Silent Pass / Hidden Skip) [grep-detectable]
Symptom: expect() inside if block, or mid-test test.skip() — test silently passes when feature is broken.
// BAD — if spinner never appears, assertion never runs
if (await spinner.isVisible()) {
await expect(spinner).toBeHidden({ timeout: 5000 });
}
Rule: Every test path must contain at least one expect(). Move environment checks to beforeEach or declaration-level test.skip().
7. Raw DOM Queries (Bypassing Framework API) [grep-detectable]
Symptom: Test drops into raw document.querySelector* / document.getElementById via evaluate() when the framework's element lookup API could do the same job.
// BAD — no auto-wait, returns stale boolean
const has = await page.evaluate((i) => {
return !!document.querySelectorAll('.para')[i]?.querySelector('.result');
}, 0);
expect(has).toBe(true);
Why it matters: No auto-waiting, no retry, boolean trap, framework error messages lost.
Rule: Use the framework's element API instead of raw DOM:
- Playwright:
page.locator()+ web-first assertions - Cypress:
cy.get()/cy.find()— avoidcy.window().then(win => win.document.querySelector(...)) - Puppeteer:
page.$()/page.waitForSelector()— avoidpage.evaluate(() => document.querySelector(...))
Only use evaluate/waitForFunction when the framework API can't express the condition (getComputedStyle, cross-element DOM relationships). In POM, add a comment explaining why.
Tier 2 — P1/P2 (check when time permits)
8. Render-Only Tests (Low E2E Value) [LLM-only]
Symptom: Test only calls toBeVisible() with no interaction or content assertion.
Rule: Add at least one of: content assertion (not.toBeEmpty(), toContainText()), count assertion (toHaveCount(n)), or sibling element assertion.
9. Duplicate Scenarios (DRY) [LLM-only]
Symptom: Two tests share >70% of their steps with minor variations.
Rule (within file): Merge tests that differ only in setup or a single assertion. Use the richer verification set from both.
Rule (cross-file): After reviewing all files in scope, cross-check tests with similar names across different spec files. If test A in feature-settings.spec.ts is a subset of test B in feature-form-validation.spec.ts, delete A and strengthen B.
Procedure:
- List all test names in the file — look for similar prefixes or overlapping verbs
- For each pair with >70% step overlap, compare their assertion sets
- If one is a subset of the other, delete the weaker test and keep the richer one
Common patterns: "should add item" and "should add item and verify count" (subset), "should open dialog" in file A and "should open dialog and fill form" in file B (cross-file subset), parameterizable tests written as separate cases.
10. Misleading Test Names (KISS) [LLM-only]
Symptom: Name implies UI interaction but test uses API/REST, or name implies feature X but tests feature Y.
Rule: If the test uses REST API, reload, or indirect methods, the name must make that explicit.
11. Over-Broad Assertions (KISS) [LLM-only]
Symptom: Assertion too loose to catch regressions.
// BAD — any string containing '%' passes
expect(content.includes('%')).toBe(true);
Rule: Prefer exact matches or explicit value lists over .includes() or loose regex when valid values are known and small.
11b. Subject-Inversion [LLM-only]
Symptom: Expected values placed in expect() instead of the actual value — failure messages become confusing.
// BAD — subject is the expected values array, not the actual result
// failure message: "Expected [200, 202] to contain 204" (confusing)
expect([200, 202]).toContain(deleteResponse.status());
// GOOD — actual value as subject, clear failure message
const status = deleteResponse.status();
expect(status === 200 || status === 202).toBe(true);
Rule: The value under test (actual) must always be the argument to expect(). Expected values go in the matcher. If the matcher doesn't support multi-value checks natively, use a boolean expression with toBe(true) rather than inverting the subject.
12. Hard-coded Timeouts [grep-detectable]
Symptom: waitForTimeout() or magic timeout numbers scattered across tests and POM.
// BAD — arbitrary sleep
await page.waitForTimeout(2000);
// BAD — magic number, no explanation
await element.waitFor({ state: 'visible', timeout: 30000 });
Rule: Never use explicit sleep (waitForTimeout / cy.wait(ms)) — rely on framework auto-wait or retry mechanisms. For custom timeouts, extract named constants with comments explaining why the default isn't sufficient.
13. Flaky Patterns [LLM-only + grep]
Symptom: Test passes locally but fails intermittently in CI due to timing, ordering, or environment assumptions.
Sub-patterns:
13a. Positional selectors — nth(), first(), last() without comment.
// BAD — breaks if DOM order changes
await expect(items.nth(2)).toContainText('Settings');
Rule: Prefer data-testid, role-based, or attribute selectors. If nth() is unavoidable, add a comment explaining why.
13b. Network dependency without mock — Test relies on real API responses without route.fulfill() / cy.intercept().
// BAD — fails if API is slow or returns different data
await page.goto('/dashboard');
await expect(page.locator('.user-count')).toHaveText('42');
Rule: For data-dependent assertions, mock the network response or assert on structure (element exists, is not empty) rather than exact values.
13c. Animation race — Assertion runs before CSS transition or animation completes.
// BAD — modal may still be animating
await button.click();
await expect(modal).toBeVisible(); // passes
await expect(modal.locator('.content')).toHaveText('Done'); // flaky — content not rendered yet
Rule: After triggering animations, wait for the final state element, not the container. Use waitForSelector with stable content or toHaveCSS('opacity', '1') for fade-ins.
14. YAGNI in Page Objects [LLM-only]
Symptom: POM has locators/methods never referenced by any spec.
Procedure:
- List all public members of each changed POM file
- Grep each member across all test files and other POMs
- Classify: USED / INTERNAL-ONLY (
private) / UNUSED (delete)
Common patterns: Convenience wrappers (clickEdit() when specs use editButton.click()), getter methods (getCount() when specs use toHaveCount()), state checkers (isEditMode() when specs assert on elements directly), pre-built "just in case" locators.
Rule: Delete unused members. Make internal-only members private. When creating new shared utils, ensure they will be used by 2+ specs. Do not delete existing util files/classes that are actively imported and used by specs — only flag unused individual members within them.
Output:
| File | Member | Used In | Status |
|------|--------|---------|--------|
| page.ts | addLinks | (none) | DELETE |
| page.ts | searchDialog | internal only | PRIVATE |
Output Format
Present findings grouped by severity:
## [P0/P1/P2] Task N: [filename] — [issue type]
### N-1. `[test name or POM method]`
- **Issue:** [description]
- **Fix:** [name change / assertion addition / merge / deletion]
- **Code:**
```typescript
// concrete code to add or change
**After all findings, append a summary table:**
```markdown
## Review Summary
| Sev | Count | Top Issue | Affected Files |
|-----|-------|-----------|----------------|
| P0 | 3 | Missing Then | auth.spec.ts, form.spec.ts |
| P1 | 5 | Duplicate Scenarios | settings.spec.ts |
| P2 | 2 | Render-Only | dashboard.spec.ts |
**Total: 10 issues across 4 files. Fix P0 first.**
Severity classification:
- P0 (Must fix): Test silently passes when the feature is broken — no real verification happening
- P1 (Should fix): Test works but gives poor diagnostics, wastes CI time, or misleads developers
- P2 (Nice to fix): Weak but not wrong — maintenance and robustness improvements
Quick Reference
| # | Check | Sev | Phase | Detection Signal |
|---|---|---|---|---|
| 1 | Name-Assertion | P0 | LLM | Noun in name with no matching expect() |
| 2 | Missing Then | P0 | LLM | Action without final state verification |
| 3 | Error Swallowing | P0 | grep | try/catch in spec, .catch(() => {}) in POM |
| 4 | Always-Passing | P0 | grep | >=0, truthy on non-empty, || defaults |
| 5 | Boolean Trap | P1 | grep | expect(locator).toBeTruthy() on non-boolean objects; skip when value is actual boolean (.ok(), .isVisible()) |
| 6 | Conditional Bypass | P0 | grep | expect() inside if, mid-test test.skip() |
| 7 | Raw DOM Queries | P1 | grep | document.querySelector in evaluate |
| 8 | Render-Only | P2 | LLM | Only toBeVisible(), no content/count |
| 9 | Duplicate | P1 | LLM | >70% shared steps, cross-file overlap |
| 10 | Misleading Name | P1 | LLM | API/reload in "should [UI verb]" test |
| 11 | Over-Broad | P2 | LLM | .includes() where enum values known |
| 11b | Subject-Inversion | P1 | LLM | expect([expected]).toContain(actual) — confusing failure messages |
| 12 | Hard-coded Timeout | P2 | grep | waitForTimeout(), magic numbers |
| 13 | Flaky Patterns | P1 | LLM+grep | nth(), missing network mock, animation race |
| 14 | YAGNI in POM | P2 | LLM | Public member not referenced in any spec |
Suppression
When a grep-detected pattern is intentional, add a // justified: [reason] comment to the line. Phase 1 will exclude it.
Example: await editor.click({ force: true }).catch(() => textArea.focus()); // justified: UI stabilization fallback