E2E Test Discovery & Execution

You're here to discover E2E test scenarios, record them as YAML, or execute existing test cases.

Core Principle: Semantic Selectors

The problem: Traditional selectors (testId, CSS classes) break when UI changes.

The solution: Use semantic selectors based on accessibility properties. These match what users see.

# ❌ Fragile
{{dataTestId: "submit-btn"}}

# ✅ Stable
{{role: button, name: "Submit"}}

Phase 1: Discovering Test Scenarios

Don't mechanically scan code/docs/UI. Use risk-driven discovery.

High-Risk Indicators

When exploring a project, prioritize areas with these characteristics:

Indicator	Why It's Risky	Example
Multi-field forms	Validation, edge cases, state	Registration, checkout
Multi-step flows	State carries across steps	Wizards, onboarding
Authentication boundaries	Permissions, sessions	Login, role-based access
External integrations	Dependencies can fail	Payment, third-party APIs
Data mutation	Side effects matter	Create, update, delete

Discovery Strategy

Start from risk, not from code structure
Use multiple sources (code, docs, browser) to verify each risk
Stop when confident, not when sources are exhausted

Ask yourself: "What would break that users would notice?" Start there.

Phase 2: Recording Test Cases

The Snapshot-Refs Pattern

# 1. Open page
agent-browser open http://localhost:3000 --headed

# 2. Snapshot to discover elements
agent-browser snapshot --json
# Returns: {"refs": {"e1": {"name": "Email", "role": "textbox"}, ...}}

# 3. Interact using refs
agent-browser fill @e1 "test@example.com"
agent-browser click @e2

# 4. Re-snapshot after DOM changes (refs get reassigned!)
agent-browser snapshot --json

Critical: Always re-snapshot after any action that changes DOM. Refs are temporary.

Recording as YAML

Store semantic selectors, not refs:

steps:
  - desc: Submit login form
    commands:
      - agent-browser fill {{role: textbox, name: "Email"}} "{{env.EMAIL}}"
      - agent-browser click {{role: button, name: "Sign In"}}
    expect:
      - url_contains: /dashboard

Phase 3: Organizing Test Cases

Use complexity gradient to enable fast root-cause analysis:

foundations/     →     flows/          →     compositions/
(atomic ops)           (journeys)            (multi-session)

login.yaml            checkout.yaml          two-user-chat.yaml
logout.yaml           onboarding.yaml        order-and-fulfill.yaml
navigate.yaml         password-reset.yaml

Why This Structure

When a composition fails:

Check which flow failed
Check which foundation in that flow failed
Fix the foundation → everything above it recovers

Foundations = building blocks (high reuse, must be stable) Flows = complete user journeys (compose foundations) Compositions = multi-session/multi-user scenarios (compose flows)

Deciding the Category

Ask: "Can this be broken into smaller independent tests?"

Yes → It's a flow or composition
No → It's a foundation

Phase 4: Executing Test Cases

Intent-Driven Execution

YAML describes intent + constraints. You (the agent) fill in execution details.

# YAML says WHAT and WHEN
- desc: Close popup if it appears
  commands:
    - agent-browser click {{role: button, name: "Close"}}
  optional: true  # Constraint: this can fail

# You decide HOW
# 1. Snapshot to check if button exists
# 2. If exists → click
# 3. If not → skip (because optional: true)
# 4. Record your decision for debugging

Execution Rules

For each step: snapshot → match selector → execute → verify expect
For optional steps: skip if element not found, but record the decision
For failures: report which selector didn't match and what was found instead
Always: close browser when done

Handling Complex Scenarios

Conditional steps (optional: true):

- desc: Dismiss cookie banner
  commands:
    - agent-browser click {{role: button, name: /Accept|Close|×/}}
  optional: true

You try the selector. If not found, skip and continue.

Data extraction:

- desc: Capture order ID from page
  extract:
    from: {{role: heading, name: /Order #\d+/}}
    pattern: "Order #(\\d+)"
    save_as: order_id

You find the element, extract the value, save it for later steps.

Multi-session (compositions):

sessions:
  - name: alice
    lifecycle: persistent
  - name: bob
    lifecycle: persistent

steps:
  - desc: Alice sends message
    session: alice
    commands: [...]

  - desc: Verify Bob received it
    session: bob
    expect:
      - element_visible: { { role: paragraph, name: /Hello/ } }

You manage separate browser sessions and switch between them.

YAML Format Reference

Minimal structure:

id: case-name
title: Human-readable title
category: foundation # foundation | flow | composition

description: |
  What this tests and why it matters.

parameters:
  base_url: http://localhost:3000

steps:
  - desc: Step description
    commands:
      - agent-browser <action> {{selector}} [args]
    expect:
      - condition: value
    optional: false # default

exploration:
  status: validated # draft | explored | validated

See FORMATS.md for complete specification.

Common Issues

Empty snapshot: Page hasn't loaded or lacks accessibility. Wait longer or check framework.

Refs changed: Expected. Always re-snapshot before interacting.

Multiple matches: Use index: {{role: button, name: "Delete", index: 0}}

Static routing: /docs may need /docs.html. Note in your test case.

Rate limiting (429): Public sites limit unauthenticated requests. Consider:

Adding delays between steps (agent-browser wait 2000)
Using authenticated sessions when available
Recording on local dev instances when possible

Checklist

When discovering:

Identified high-risk areas (forms, multi-step, auth, integrations)
Verified risks from multiple sources
Prioritized by user impact

When recording:

Used semantic selectors, not refs
Re-snapshot after DOM changes
Added expect conditions for verification

When organizing:

Categorized by complexity (foundation/flow/composition)
Foundations are atomic and reusable
Compositions reference flows, flows reference foundations

When executing:

Followed intent, applied judgment for details
Recorded decisions for debugging
Closed all browser sessions

agent-e2e