agent-browser
agent-browser
agent-browser is the fresh-session deterministic browser verification skill in this repo.
Use it when the real need is: open a clean disposable browser, inspect the current page state, perform one concrete action, and prove what changed with explicit evidence. The key behaviors are isolation, stable refs from snapshots, and an observe → act → observe verification loop.
When to use this skill
Use agent-browser when the task needs one or more of these:
- a clean reproducible browser session instead of the user's real browser profile
- deterministic form checks, navigation checks, and page-state verification
- structured snapshot refs (
@e1,@e2, …) before interacting with the page - explicit before/after evidence such as snapshot diffs, screenshots, or extracted text
- CI-style or automation-friendly browser checks where reproducibility matters more than session continuity
- isolated parallel browser tasks with named sessions
Do not use agent-browser by default for:
- reusing the browser the user already has open, with live cookies, extensions, or trusted-device state →
playwriter - exact rendered-UI review packets or annotation handoff from a human reviewer →
agentation - plan review, diff approval, or artifact sign-off workflows →
plannotator - vague broad web-task autonomy when the real need is a stateful authenticated browser lane
Quick routing rule
| If the job needs... | Use |
|---|---|
| A clean disposable browser and repeatable verification | agent-browser |
| Existing logins, cookies, extensions, or a browser already open | playwriter |
| Exact rendered-UI feedback with selectors / annotation packets | agentation |
| Plan or diff review in a browser | plannotator |
Instructions
Step 1: Confirm the browser lane
Choose agent-browser only when a fresh session is the correct default. If the task depends on the user's existing browser state, route out before doing setup work.
Step 2: Follow the core loop
Always use the same loop:
- Open a clean browser session
- Wait for a stable page state
- Observe first with
snapshot -i - Act once using fresh refs
- Observe again before the next action
- Verify with explicit evidence
This is the repo's browser-verification contract. If you skip the observe steps, you lose the deterministic part of the workflow.
Step 3: Start from the smallest useful command set
agent-browser open https://example.com/form
agent-browser wait --load networkidle
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser click @e2
agent-browser snapshot -i
Rules:
- Never keep using old
@eNrefs after navigation or meaningful DOM change. - Prefer
wait --load networkidleor a targeted wait over fixed sleeps. - Keep one browser action between observations when debugging or verifying.
Step 4: Choose one verification mode
| Mode | Use when | Evidence |
|---|---|---|
| Snapshot diff | Semantic page structure or accessible content changed | diff snapshot |
| Screenshot diff | Rendered layout or visual state matters | diff screenshot --baseline ... |
| Targeted extraction | You need exact text, URL, or field value | get text, get url, or narrowed snapshot |
| PDF / capture | The deliverable is a captured artifact | pdf, screenshot |
Prefer the lightest mode that proves the change. Use screenshots when visual truth matters; do not use them as the only default.
Step 5: Use named sessions for isolation
agent-browser --session signup-check open https://example.com/signup
agent-browser --session settings-check open https://example.com/settings
agent-browser session list
Use one named session per autonomous worker or test lane. Close sessions when finished.
Step 6: Keep authentication bounded
A clean-session skill can still save or load auth state, but that should stay explicit:
agent-browser open https://app.example.com/login
agent-browser wait --load networkidle
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
Then later:
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
agent-browser snapshot -i
Use this for bounded reproducible reuse. If the real workflow depends on a long-lived personal browser, passkeys, SSO handoff, extensions, or active human browsing, route to playwriter instead.
Step 7: Use complex evaluation payloads safely
For multi-line JavaScript or extraction logic, prefer stdin so shell quoting does not destroy the payload:
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify({
title: document.title,
links: document.links.length,
buttons: document.querySelectorAll('button').length
})
EVALEOF
High-value command patterns
Clean browser check
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser snapshot -i
agent-browser get url
Form submission with verification
agent-browser open https://example.com/form
agent-browser wait --load networkidle
agent-browser snapshot -i
agent-browser fill @e1 "hello@example.com"
agent-browser click @e2
agent-browser diff snapshot
Visual regression slice
agent-browser open https://example.com/pricing
agent-browser wait --load networkidle
agent-browser screenshot baseline.png
agent-browser click @e5
agent-browser diff screenshot --baseline baseline.png
Session cleanup
agent-browser --session signup-check close
agent-browser close
Safety and reliability
- Fresh refs only: re-run
snapshot -iafter navigation or major DOM updates. - Prefer deterministic waits over fixed sleeps.
- Keep authentication files out of version control.
- Use allowed-domain and action-policy guards in sensitive runs.
- Prefer one small verified step over a giant multi-action leap.
- Route out aggressively when the task is really about running-browser reuse or exact visual review.
Troubleshooting
| Issue | What to check |
|---|---|
| Wrong element clicked | Refresh snapshot -i and use fresh refs |
| Dynamic content missing | Wait for networkidle or a targeted selector/url |
| Output too large | Narrow the snapshot or use targeted extraction |
| Auth is too stateful or MFA-heavy | Route to playwriter instead of forcing clean-session automation |
| Need exact rendered-page feedback | Use agentation after the browser step |
| Parallel tasks are colliding | Assign unique session names and close them cleanly |
Examples
Example 1: Repeatable checkout verification
- Prompt: "Run a clean browser check that fills the checkout form and proves the confirmation state appears."
- Expected behavior: choose
agent-browser, use a fresh session, observe before/after, and verify with an explicit diff or extracted state.
Example 2: Logged-in personal browser flow
- Prompt: "Use my existing signed-in browser tabs to change a billing setting."
- Expected behavior: route to
playwriter, because session continuity is the real requirement.
Example 3: Human leaves exact UI feedback
- Prompt: "I need to click the broken UI and send the exact selector/path to the agent."
- Expected behavior: route to
agentation, because rendered-UI annotation is the real deliverable.
Best practices
- Choose
agent-browserbecause a clean browser matters, not because the word "browser" appears. - Follow observe → act → observe every time the page meaningfully changes.
- Prefer semantic evidence (snapshot diff, extracted state) before visual evidence when it proves the point.
- Keep auth reuse explicit and bounded; do not slide into a stateful personal-browser workflow by accident.
- Use named sessions for concurrency and close them when done.
- Report what was verified, not just what was clicked.
References
Deep-dive docs in this skill:
Primary sources:
- https://github.com/vercel-labs/agent-browser
- https://agent-browser.dev
- https://playwright.dev/docs/locators
- https://playwright.dev/docs/aria-snapshots
Ready templates:
./templates/form-automation.sh./templates/capture-workflow.sh