browser
Browser Automation
VERIFY AND INTERACT. Use agent-browser to prove UI changes work, test forms, extract data. Everything is ref-based: snapshot first, act by ref, verify by diff.
Rules
- Always snapshot before acting — refs only exist after a snapshot
- Always re-snapshot after any DOM change — refs are snapshot-scoped
- Always wait for network idle on pages with async data before snapshotting
- Always close sessions when done — daemon leaks resources
- Never use CSS selectors — use @refs from snapshot or semantic locators
- SPA navigation: wait for route change to complete, not just the triggering click
When to Use
| Need | Approach |
|---|---|
| Prove UI change works | Visual verification flow |
| Test form behavior | Form testing flow |
| Site requires login | Auth flow first, then test |
| Extract structured data from page | Data extraction flow |
| Compare before/after visually | Diff-based evidence |
| Explore interactive elements | snapshot -i for interactive-only tree |
Flow: Visual Verification
The core Claude Code use case — machine-verifiable proof that a UI change worked.
open URL → wait --load networkidle → screenshot before.png
→ [make code changes, reload app]
→ reload → wait --load networkidle → screenshot after.png
→ diff screenshot -b before.png → report mismatch %
Mismatch percentage is the evidence. Zero means identical. Non-zero means visible change — expected or not.
Use after: any UI change, hope:loop wave verification, hope:verify visual checks.
Flow: Form Testing
Validate all form paths — happy, empty, invalid.
open URL → wait --load networkidle → snapshot -i
→ read refs from interactive elements
→ fill @ref "value" for each field → click @submit-ref
→ wait --load networkidle → snapshot → verify outcome
Test all three paths:
- Happy path — fill valid data, submit, verify success state
- Empty submit — submit without filling, verify validation messages appear
- Invalid input — fill bad data (wrong email format, too-short password), submit, verify error states
Re-snapshot after each submission — refs invalidate on DOM change.
Flow: Auth-Required Testing
Sessions persist browser state across commands — login once, test many.
agent-browser --session-name myapp open login-url
→ wait --load networkidle → snapshot -i
→ fill @username "user" → fill @password "pass" → click @login
→ wait --url "/dashboard" → snapshot → verify logged in
All subsequent commands reuse the session:
agent-browser --session myapp open /settings → snapshot -i
For CI: state save ./auth.json after login, state load ./auth.json before test runs.
Flow: Data Extraction
Pull structured data from rendered pages.
open URL → wait --load networkidle → snapshot
→ eval --stdin to extract structured data
Simple values: get text @ref returns the text content of a single element.
Multi-line extraction for complex structures:
agent-browser eval --stdin <<'EOF'
JSON.stringify(
[...document.querySelectorAll('.product')].map(el => ({
name: el.querySelector('h2').textContent,
price: el.querySelector('.price').textContent
}))
)
EOF
Flow: Semantic Locators
Alternative to refs when you know what you're looking for by label or role:
agent-browser find text "Submit" click
agent-browser find label "Email" fill "user@example.com"
agent-browser find role "button" click
agent-browser find placeholder "Search..." fill "query"
agent-browser find testid "login-btn" click
Locator type first, then value, then action. Defaults to click if no action specified.
Composition
- With portless: use
myapp.localhost:1355URLs — stable across restarts, no port guessing - With watch: d3k monitors server + browser while agent-browser interacts — full context alongside automation via
d3k agent-browser --cdp $(d3k cdp-port) - With hope:loop: browser diff output as verification evidence in wave reports
- With hope:verify: screenshot diffs for visual regression assertions in pre-PR checks
Anti-Patterns
- Acting without snapshotting — blind interaction with no refs, guaranteed failure
- Assuming refs persist across navigations — every DOM change invalidates all refs
- Hardcoding port numbers — use portless URLs; ports change between restarts
- Screenshotting without waiting — captures loading spinners, not the actual page
- Forgetting to close sessions — daemon accumulates across test runs, leaking resources
- Snapshotting full page when only forms matter —
snapshot -ifilters to interactive elements only