agent-browser
Agent Browser
CLI browser automation via agent-browser (Rust binary, v0.13.0).
Step 0: Launch Chrome (MANDATORY — Run Before Anything Else)
Before ANY agent-browser command, run:
start-chrome-debug
This detects the platform (WSL or native Linux), launches Chrome with a persistent profile (saved cookies — Google, Stripe, etc.), and connects agent-browser on port 9222. If Chrome is already running correctly, it reconnects instantly.
- WSL: Launches Windows Chrome via PowerShell with the
ChromeCDPprofile - Native Linux: Launches
google-chrome/chromiumwith~/.config/agent-browser-chromeprofile
One-time sign-in: ChromeCDP is a dedicated automation profile (required by Chrome 136+ security — the default profile cannot use CDP). Sign in to Google/Stripe/etc. once in the CDP Chrome window. Cookies persist across all future sessions.
CRITICAL RULES:
- NEVER launch Chrome manually (
google-chrome,chromium,chrome.exe). Always usestart-chrome-debug. - NEVER launch a separate browser instance. agent-browser manages its own CDP connection.
- NEVER run
agent-browser connectmanually —start-chrome-debughandles this. - If
start-chrome-debugreports an error, STOP and tell the user.
Environment (Pre-configured)
Environment variables in .bashrc — do not modify:
AGENT_BROWSER_HEADED=1— visible Chrome window, user can interject anytimeAGENT_BROWSER_AUTO_CONNECT=1— auto-discovers running Chrome CDPAGENT_BROWSER_SESSION="claude-${PPID}"— each Claude session gets its own isolated daemon/tabAGENT_BROWSER_ARGS— anti-bot-detection flags
The persistent Chrome profile has saved cookies. After running start-chrome-debug, no login needed.
Phase 0: Research the UI (MANDATORY — No Exceptions)
Before opening ANY website with agent-browser open, you MUST research the current UI first. Your training data is stale. UIs change constantly. Research first, click second.
- WebSearch the current UI flow — search for "[site name] [task] steps [current year]" or "[site name] UI layout [current year]"
- Example: "Google Ads create conversion action steps 2026"
- Example: "Stripe connect webhook endpoint setup 2026"
- Example: "Amazon order flow current layout 2026"
- Example: "threads.com compose new post UI 2026"
- Document the expected navigation path before opening the browser:
- Where target buttons/links/forms are in the CURRENT UI
- What the current navigation path looks like
- Any recent UI redesigns or layout changes
- What form fields to fill and with what values
- What confirmation screens to expect
- Then execute using the Core Workflow below, following the researched path step-by-step
NO EXCEPTIONS. Not for "simple" sites. Not for "your own" sites. Not for sites you "already know." Your knowledge is stale. This is a hard gate — skip it and you WILL brute-force through wrong clicks and waste tokens.
Core Workflow
# Navigate + wait + get element refs
agent-browser open URL && agent-browser wait --load networkidle && agent-browser snapshot -i --compact
# Interact using @refs from snapshot
agent-browser click @e5
agent-browser fill @e3 "text"
# Re-snapshot after DOM changes (refs become stale)
agent-browser snapshot -i --compact
Chain commands with && for speed. Use separate calls when you need to parse output before next step.
Session Isolation (Automatic)
Each Claude Code session gets its own isolated Chrome tab via AGENT_BROWSER_SESSION="claude-${PPID}". Multiple sessions NEVER share tabs — each daemon creates a fresh target on connect. No configuration needed.
Parallel Verification (Multi-Tab)
Use tab new to open multiple pages simultaneously within one session:
# Open multiple tabs for parallel checks
agent-browser tab new https://site.com/page1 && agent-browser tab new https://site.com/page2
# List all tabs (shows index numbers)
agent-browser tab list
# Switch to tab by index, then screenshot/snapshot
agent-browser tab 0 && agent-browser screenshot page1.png
agent-browser tab 1 && agent-browser screenshot page2.png
# Clean up when done
agent-browser tab close 1 && agent-browser tab close 0
Or use ab-parallel for bulk checks:
ab-parallel check https://site.com/page1 https://site.com/page2
When to use tabs vs sequential open:
- Sequential
open: Same tab, navigating through a flow (login → dashboard → settings) tab new: Parallel verification — checking multiple independent pages without losing state
Testing / Verification Workflow
When user says "test" or "verify" a feature:
- Act as a real user — click, type, fill forms (not programmatic tests)
- Trigger the action — submit form, click button, complete flow
- Verify downstream effects:
- Check the database (SSH + SQL query)
- Check other pages where the change should appear
- Use
ab-parallelfor multi-page checks
- Take screenshots as evidence at each step
- Check console —
agent-browser errorsmust be clean
Example: fill checkout email + submit -> verify Purchase row in DB -> verify success page -> verify admin dashboard updated.
Gate Requirements
The hook system tracks agent-browser calls. Gate clears when ALL met:
- Interacted (click/type/fill — not just navigate)
- Visited 2+ pages
agent-browser errorsreturned cleanagent-browser screenshottaken
Key Commands
| Command | Purpose |
|---|---|
open <url> |
Navigate |
snapshot -i --compact |
Accessibility tree with @refs, compact (fewer tokens) |
click @e1 |
Click element |
fill @e1 "text" |
Clear + type |
type @e1 "text" |
Append text |
screenshot --format jpeg --quality 80 |
Capture page (JPEG = 3-5x smaller than PNG) |
errors |
Check console errors |
get text @e1 |
Extract text |
get url |
Current URL |
eval <js> |
Run JavaScript |
wait --load networkidle |
Wait for page load |
tab new [url] |
Open new tab (optionally navigate) |
tab list |
List all tabs with index numbers |
tab <n> |
Switch to tab by index |
tab close [n] |
Close tab (current or by index) |
Multi-Agent Coordination
Use ab-tasks for inter-session coordination (no daemon needed — pure file I/O):
Coordinator + Workers Pattern
CRITICAL: Each worker MUST use its own session. Without this, all workers fight over one tab.
# Coordinator creates tasks
ab-tasks create "Check Amazon seller rating for WidgetCo"
ab-tasks create "Message Alibaba supplier about bulk pricing"
ab-tasks create "Scrape competitor pricing on eBay"
Each worker agent must set a unique session before any browser commands:
# Worker sets unique session (gets its own independent Chrome tab)
export AGENT_BROWSER_SESSION="worker-1"
ab-tasks claim # Claims next pending task
agent-browser open <url-from-task> # Own tab, no conflicts
agent-browser snapshot -i --compact
# ... do the work ...
ab-tasks complete <id> "result data" # Mark done with result
ab-tasks share worker1_finding "key insight" # Share with other workers
agent-browser close # Clean up own session
Workers run in TRUE parallel — each has its own daemon, its own Chrome tab, zero interference.
Fast Execution: ab-workers
For mechanical tasks (open URL, get data, report back), use ab-workers instead of AI agents — 3x faster, zero AI overhead:
# 1. Create tasks
ab-tasks create "Get title from https://example.com/page1"
ab-tasks create "Get title from https://example.com/page2"
ab-tasks create "Get title from https://example.com/page3"
# 2. Execute all in parallel (5 workers default)
ab-workers # Claims all pending tasks, runs in parallel bash workers
ab-workers 10 # Use up to 10 parallel workers
ab-workers auto-claims tasks, opens each URL in its own session/tab, gets the title, completes the task, and shares results. Use AI agents only when the task needs reasoning (form filling, navigation decisions, data interpretation).
# Coordinator checks progress and collects results
ab-tasks list # See all tasks + status
ab-tasks shared # Read results from all workers
Shared State
ab-tasks share <key> <value> # Write (scoped to AGENT_BROWSER_SESSION)
ab-tasks shared [key] # Read across all sessions