agent-browser
Installation
SKILL.md
agent-browser - Browser Automation for AI Agents
When to use this skill
- Open websites and automate UI actions
- Fill forms, click controls, and verify outcomes
- Capture screenshots or PDFs, or extract page content
- Run deterministic web checks with accessibility refs
- Execute parallel browser tasks via isolated sessions
- Prefer this over
playwriterwhen you want disposable browser state instead of the user's already-running Chrome session
Instructions
Step 1: Pick the right browser surface
- Use
agent-browserwhen the task should run in a fresh or isolated browser session, when browser state should be disposable, or when the flow needs to be reproducible in CI or automation. - Route to
playwriterinstead when the task depends on the user's current Chrome state, saved logins, cookies, extensions, or already-open tabs.
Step 2: Refresh runtime guidance when version-specific behavior matters
- Use the bundled instructions here for the stable default workflow.
- If the installed CLI may be newer than this repo copy, refresh the live instructions first:
agent-browser skills list
agent-browser skills get agent-browser --full
- Check specialized runtime skills when the workflow is narrower than general browser automation, such as
dogfood,slack, orelectron.
Step 3: Use the deterministic ref loop
Always use the ref-first loop:
agent-browser open <url>agent-browser wait --load networkidlewhen navigation is still settlingagent-browser snapshot -i- Interact with refs (
@e1,@e2, ...) agent-browser snapshot -ioragent-browser diff snapshotagain after the page or DOM changes
agent-browser open https://example.com/form
agent-browser wait --load networkidle
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser click @e2
agent-browser diff snapshot
Step 4: Verify after every meaningful action
Use explicit evidence after actions.
# Baseline -> action -> verify structure
agent-browser snapshot -i
agent-browser click @e3
agent-browser diff snapshot
# Visual regression
agent-browser screenshot baseline.png
agent-browser click @e5
agent-browser diff screenshot --baseline baseline.png
Step 5: Scale safely with sessions, waits, and scoped output
Use && chaining when intermediate output is not needed.
# Good chaining: open -> wait -> snapshot
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
# Separate calls when output is needed first
agent-browser snapshot -i
# parse refs
agent-browser click @e2
High-value commands:
- Navigation:
open,close - Snapshot:
snapshot -i,snapshot -i -C,snapshot -s "#selector" - Interaction:
click,fill,type,select,check,press - Verification:
diff snapshot,diff screenshot --baseline <file>,diff url <url1> <url2> - Capture:
screenshot,screenshot --annotate,pdf - Wait:
wait --load networkidle,wait <selector|@ref|ms>
Examples
Example 1: Drive a public form in an isolated browser
Input:
Use agent-browser to open the contact page, fill the form, submit it, and verify the page changed.
Output shape:
- uses
agent-browser, notplaywriter - follows
open -> snapshot -i -> interact -> diff snapshot - re-snapshots or diffs after the submit action
Example 2: Compare staging and production without reusing local browser state
Input:
Compare the staging and production homepages with agent-browser and show whether the structure or screenshot changed.
Output shape:
- stays on
agent-browseras the isolated verification surface - uses
diff url,diff snapshot, ordiff screenshot --baseline ... - keeps evidence explicit instead of describing the result from memory
Example 3: Choose playwriter when the task needs the user's existing login
Input:
I need to inspect an authenticated checkout flow that depends on my saved Chrome login and current cart state. Should I use agent-browser or playwriter?
Output shape:
- routes to
playwriterfor the logged-in running-browser case - explains that
agent-browseris the isolated or disposable alternative - preserves the distinction between headless verification and stateful browser control
Best practices
- Prefer snapshot refs (
@e1,@e2) over fragile CSS selectors whenever possible. - Re-run
snapshot -iafter navigation or major DOM changes before acting again. - Prefer
wait --load networkidleor selector/ref waits over fixed sleeps. - Use
--session <name>to isolate parallel work or preserve reusable auth safely. - Use
diff snapshot,diff screenshot, or saved baselines instead of assuming the page changed correctly. - Refresh CLI-served skills with
agent-browser skills get ...when you suspect version drift between the repo copy and the installed binary. - Apply domain allowlists, content boundaries, and action policies in sensitive or prompt-injection-prone flows.
Safety and reliability
- Refs are invalid after navigation or significant DOM updates; re-snapshot before the next action.
- For multi-step JS, use
eval --stdinor base64 input to avoid shell escaping breakage. - For concurrent tasks, isolate with
--session <name>. - Use output controls in long pages to reduce context flooding.
- Optional hardening in sensitive flows: domain allowlist and action policies.
Optional hardening examples:
# Wrap page content with boundaries to reduce prompt-injection risk
export AGENT_BROWSER_CONTENT_BOUNDARIES=1
# Limit output volume for long pages
export AGENT_BROWSER_MAX_OUTPUT=50000
# Restrict navigation and network to trusted domains
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
# Restrict allowed action types
export AGENT_BROWSER_ACTION_POLICY=./policy.json
Example policy.json:
{"default":"deny","allow":["navigate","snapshot","click","fill","scroll","wait","get"],"deny":["eval","download","upload","network","state"]}
CLI-flag equivalent:
agent-browser --content-boundaries --max-output 50000 --allowed-domains "example.com,*.example.com" --action-policy ./policy.json open https://example.com
Troubleshooting
command not found: install and runagent-browser install.- Wrong element clicked: run
snapshot -iagain and use fresh refs. - Dynamic SPA content missing: wait with
--load networkidleor targetedwaitselector. - Session collisions: assign unique
--sessionnames and close each session. - Large output pressure: narrow snapshots (
-i,-c,-d,-s) and extract only needed text.
References
Deep-dive docs in this skill:
Related resources: