agent-browser - Browser Automation for AI Agents

When to use this skill

Open websites and automate UI actions
Fill forms, click controls, and verify outcomes
Capture screenshots or PDFs, or extract page content
Run deterministic web checks with accessibility refs
Execute parallel browser tasks via isolated sessions
Prefer this over playwriter when you want disposable browser state instead of the user's already-running Chrome session

Instructions

Step 1: Pick the right browser surface

Use agent-browser when the task should run in a fresh or isolated browser session, when browser state should be disposable, or when the flow needs to be reproducible in CI or automation.
Route to playwriter instead when the task depends on the user's current Chrome state, saved logins, cookies, extensions, or already-open tabs.

Step 2: Refresh runtime guidance when version-specific behavior matters

Use the bundled instructions here for the stable default workflow.
If the installed CLI may be newer than this repo copy, refresh the live instructions first:

agent-browser skills list
agent-browser skills get agent-browser --full

Check specialized runtime skills when the workflow is narrower than general browser automation, such as dogfood, slack, or electron.

Step 3: Use the deterministic ref loop

Always use the ref-first loop:

agent-browser open <url>
agent-browser wait --load networkidle when navigation is still settling
agent-browser snapshot -i
Interact with refs (@e1, @e2, ...)
agent-browser snapshot -i or agent-browser diff snapshot again after the page or DOM changes

agent-browser open https://example.com/form
agent-browser wait --load networkidle
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser click @e2
agent-browser diff snapshot

Step 4: Verify after every meaningful action

Use explicit evidence after actions.

# Baseline -> action -> verify structure
agent-browser snapshot -i
agent-browser click @e3
agent-browser diff snapshot

# Visual regression
agent-browser screenshot baseline.png
agent-browser click @e5
agent-browser diff screenshot --baseline baseline.png

Step 5: Scale safely with sessions, waits, and scoped output

Use && chaining when intermediate output is not needed.

# Good chaining: open -> wait -> snapshot
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

# Separate calls when output is needed first
agent-browser snapshot -i
# parse refs
agent-browser click @e2

High-value commands:

Navigation: open, close
Snapshot: snapshot -i, snapshot -i -C, snapshot -s "#selector"
Interaction: click, fill, type, select, check, press
Verification: diff snapshot, diff screenshot --baseline <file>, diff url <url1> <url2>
Capture: screenshot, screenshot --annotate, pdf
Wait: wait --load networkidle, wait <selector|@ref|ms>

Examples

Example 1: Drive a public form in an isolated browser

Input:

Use agent-browser to open the contact page, fill the form, submit it, and verify the page changed.

Output shape:

uses agent-browser, not playwriter
follows open -> snapshot -i -> interact -> diff snapshot
re-snapshots or diffs after the submit action

Example 2: Compare staging and production without reusing local browser state

Input:

Compare the staging and production homepages with agent-browser and show whether the structure or screenshot changed.

Output shape:

stays on agent-browser as the isolated verification surface
uses diff url, diff snapshot, or diff screenshot --baseline ...
keeps evidence explicit instead of describing the result from memory

Example 3: Choose playwriter when the task needs the user's existing login

Input:

I need to inspect an authenticated checkout flow that depends on my saved Chrome login and current cart state. Should I use agent-browser or playwriter?

Output shape:

routes to playwriter for the logged-in running-browser case
explains that agent-browser is the isolated or disposable alternative
preserves the distinction between headless verification and stateful browser control

Best practices

Prefer snapshot refs (@e1, @e2) over fragile CSS selectors whenever possible.
Re-run snapshot -i after navigation or major DOM changes before acting again.
Prefer wait --load networkidle or selector/ref waits over fixed sleeps.
Use --session <name> to isolate parallel work or preserve reusable auth safely.
Use diff snapshot, diff screenshot, or saved baselines instead of assuming the page changed correctly.
Refresh CLI-served skills with agent-browser skills get ... when you suspect version drift between the repo copy and the installed binary.
Apply domain allowlists, content boundaries, and action policies in sensitive or prompt-injection-prone flows.

Safety and reliability

Refs are invalid after navigation or significant DOM updates; re-snapshot before the next action.
For multi-step JS, use eval --stdin or base64 input to avoid shell escaping breakage.
For concurrent tasks, isolate with --session <name>.
Use output controls in long pages to reduce context flooding.
Optional hardening in sensitive flows: domain allowlist and action policies.

Optional hardening examples:

# Wrap page content with boundaries to reduce prompt-injection risk
export AGENT_BROWSER_CONTENT_BOUNDARIES=1

# Limit output volume for long pages
export AGENT_BROWSER_MAX_OUTPUT=50000

# Restrict navigation and network to trusted domains
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"

# Restrict allowed action types
export AGENT_BROWSER_ACTION_POLICY=./policy.json

Example policy.json:

{"default":"deny","allow":["navigate","snapshot","click","fill","scroll","wait","get"],"deny":["eval","download","upload","network","state"]}

CLI-flag equivalent:

agent-browser --content-boundaries --max-output 50000 --allowed-domains "example.com,*.example.com" --action-policy ./policy.json open https://example.com

Troubleshooting

command not found: install and run agent-browser install.
Wrong element clicked: run snapshot -i again and use fresh refs.
Dynamic SPA content missing: wait with --load networkidle or targeted wait selector.
Session collisions: assign unique --session names and close each session.
Large output pressure: narrow snapshots (-i, -c, -d, -s) and extract only needed text.

References

Deep-dive docs in this skill:

Related resources: