agent-browser
Agent-Browser Driver
The orchestrator routed you here. Use these mechanics to execute your plan.
Control web pages and Electron desktop apps via the agent-browser CLI. Uses Playwright under the hood with a headless Chromium instance managed by a background daemon.
When to use
- Automating web app flows (login, form fill, data extraction, visual QA)
- Driving Electron apps (VS Code, Slack, Discord, Figma, Notion, Spotify)
- Visual verification -- screenshots and annotated element overlays
- DOM-level assertions where terminal snapshots are irrelevant
If the target is a terminal TUI, use tuistory or true-input instead.
Prerequisites
agent-browser install # one-time: downloads bundled Chromium
For Electron apps, the target app must be launched with --remote-debugging-port=<port>.
Core workflow
Every interaction follows the same loop:
agent-browser open <url>
agent-browser snapshot -i # interactive elements only -> refs like @e1, @e2
agent-browser click @e3 # interact using refs
agent-browser snapshot -i # re-snapshot (refs invalidate after navigation/DOM changes)
agent-browser close # always close when done
Command chaining
Commands share a persistent daemon, so && chaining is safe:
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
Chain when you don't need intermediate output. Run separately when you need to parse refs before acting.
Command reference
Navigation
| Command | Purpose |
|---|---|
open <url> |
Navigate (auto-prepends https:// if no protocol) |
back / forward / reload |
History navigation |
close |
Shut down browser session |
connect <port> |
Attach to a running browser/Electron app via CDP |
Snapshot (page analysis)
| Command | Purpose |
|---|---|
snapshot |
Full accessibility tree |
snapshot -i |
Interactive elements only (recommended default) |
snapshot -i -C |
Include cursor-interactive elements (onclick divs) |
snapshot -c |
Compact output |
snapshot -d <n> |
Limit tree depth |
snapshot -s "<selector>" |
Scope to CSS selector |
Interactions (use @refs from snapshot)
| Command | Purpose |
|---|---|
click @e1 |
Click (dblclick for double-click) |
fill @e2 "text" |
Clear field and type |
type @e2 "text" |
Type without clearing |
press Enter |
Press key (combos: Control+a) |
keyboard type "text" |
Type at current focus (no ref needed) |
keyboard inserttext "text" |
Insert without key events (Electron custom inputs) |
hover @e1 |
Hover |
check @e1 / uncheck @e1 |
Toggle checkbox |
select @e1 "value" |
Select dropdown option |
scroll down 500 |
Scroll page (--selector for containers) |
scrollintoview @e1 |
Scroll element into view |
drag @e1 @e2 |
Drag and drop |
upload @e1 file.pdf |
Upload file |
Semantic locators (when refs are unreliable)
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find testid "submit-btn" click
Get information
| Command | Purpose |
|---|---|
get text @e1 |
Element text (get text body > page.txt for full page) |
get html @e1 |
innerHTML |
get value @e1 |
Input value |
get attr @e1 href |
Element attribute |
get title / get url |
Page title / URL |
get count ".item" |
Count matching elements |
Check state
agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1
Wait
| Command | Purpose |
|---|---|
wait @e1 |
Wait for element |
wait 2000 |
Wait milliseconds |
wait --text "Success" |
Wait for text |
wait --url "**/dashboard" |
Wait for URL pattern |
wait --load networkidle |
Wait for network idle (best for slow pages) |
wait --fn "window.ready" |
Wait for JS condition |
JavaScript (eval)
agent-browser eval 'document.title'
# Complex JS -- use --stdin to avoid shell quoting issues
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(Array.from(document.querySelectorAll("a")).map(a => a.href))
EVALEOF
Diff (compare page states)
agent-browser diff snapshot # current vs last snapshot
agent-browser diff snapshot --baseline before.txt # current vs saved file
agent-browser diff screenshot --baseline before.png # visual pixel diff
agent-browser diff url <url1> <url2> # compare two pages
Dialogs
agent-browser dialog accept [text] # accept alert/confirm/prompt
agent-browser dialog dismiss # dismiss dialog
Tabs & frames
agent-browser tab # list tabs
agent-browser tab new [url] # new tab
agent-browser tab 2 # switch to tab by index
agent-browser tab close # close current tab
agent-browser frame "#iframe" # switch to iframe
agent-browser frame main # back to main frame
Screenshots & recording
agent-browser screenshot # save to temp directory
agent-browser screenshot path.png # save to specific path
agent-browser screenshot --full # full-page screenshot
agent-browser screenshot --annotate # annotated with numbered element labels
agent-browser pdf output.pdf # save as PDF
--annotate overlays numbered labels on interactive elements. Each label [N] maps to ref @eN, enabling both visual verification and immediate interaction.
Video recording:
agent-browser record start ./demo.webm
# ... perform actions ...
agent-browser record stop
agent-browser record restart ./take2.webm # stop current + start new
Recording creates a fresh context but preserves cookies/storage. Explore first, then start recording for smooth demos.
Ref lifecycle
Refs (@e1, @e2, ...) are invalidated whenever the page changes. Always re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)
agent-browser click @e5 # navigates
agent-browser snapshot -i # MUST re-snapshot
agent-browser click @e1 # use new refs
Electron app automation
Any Electron app supports --remote-debugging-port since it's built on Chromium.
Launch and connect
# macOS
open -a "Slack" --args --remote-debugging-port=9222
# Linux
slack --remote-debugging-port=9222
# Then connect
sleep 3
agent-browser connect 9222
agent-browser snapshot -i
The app must be quit first if already running -- the flag only takes effect at launch.
Tab management in Electron
Electron apps often have multiple windows/webviews:
agent-browser tab # list targets
agent-browser tab 2 # switch by index
agent-browser tab --url "*settings*" # switch by URL pattern
Electron troubleshooting
| Problem | Fix |
|---|---|
| "Connection refused" | Ensure app was launched with --remote-debugging-port; quit and relaunch if already running |
| Connect fails after launch | sleep 3 before connecting; app needs time to initialize |
| Elements missing from snapshot | Try snapshot -i -C; use tab to switch to the correct webview |
| Cannot type in fields | Use keyboard type "text" or keyboard inserttext "text" for custom input components |
| Dark mode lost | Set AGENT_BROWSER_COLOR_SCHEME=dark or use --color-scheme dark |
State persistence
Save and restore cookies/localStorage across sessions:
agent-browser open https://app.example.com/login
# ... login flow ...
agent-browser state save auth.json
# Later: load saved state
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Auto-save/restore with named sessions:
agent-browser --session-name myapp open https://app.example.com
# state auto-saved on close, auto-loaded on next launch with same --session-name
Sessions
The browser persists via a background daemon. One session is the default.
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
agent-browser --session test1 close
agent-browser --session test2 close
Each --session spawns a separate Chromium process (~300 MB). Prefer navigating within a single session. Exception: controlling multiple Electron apps on different CDP ports.
Global options
| Flag | Purpose |
|---|---|
--session <name> |
Isolated browser session |
--headed |
Show browser window |
--cdp <port> |
Connect via CDP |
--auto-connect |
Auto-discover running Chrome |
--proxy <url> |
Use proxy server |
--color-scheme dark |
Force dark/light mode |
--ignore-https-errors |
Accept self-signed certs |
--allow-file-access |
Enable file:// URLs |
--json |
JSON output for parsing |
Debugging
agent-browser --headed open example.com # visible browser
agent-browser console # view console messages
agent-browser errors # view page errors
agent-browser highlight @e1 # highlight element
Gotchas
- Invisible-to-snapshot elements.
contenteditabledivs and custom components may not appear in accessibility snapshots. Useevalto interact:agent-browser eval --stdin <<'EVALEOF' const el = document.querySelector("[contenteditable]"); el.focus(); el.textContent = "hello"; el.dispatchEvent(new Event('input', { bubbles: true })); EVALEOF - Unstable class names. Never hardcode CSS-in-JS class names (
sc-*,css-*). Find elements by text content,cursor: pointerstyle, ortestidinstead. - SPA loading delays. Single-page apps may take 5-10s to render after navigation. Double-wait:
wait --load networkidlethenwait 5000. - Flag ordering. Global flags (
--headers,--session,--cdp) must come before the subcommand:agent-browser --headers '{}' open <url>.
Critical rules
- Always take screenshots for visual QA. Text snapshots miss layout, styling, alignment, and z-index issues. Use
screenshot --annotatewhen you need both visual proof and element refs. - One session by default. Navigate between pages with
open <url>instead of creating new sessions. - Always close when done.
agent-browser closefrees the Chromium process. - Re-snapshot after every navigation. Refs are invalidated.