Agent Browser

CLI browser automation via agent-browser (Rust binary, v0.13.0).

Step 0: Launch Chrome (MANDATORY — Run Before Anything Else)

Before ANY agent-browser command, run:

start-chrome-debug

This detects the platform (WSL or native Linux), launches Chrome with a persistent profile (saved cookies — Google, Stripe, etc.), and connects agent-browser on port 9222. If Chrome is already running correctly, it reconnects instantly.

WSL: Launches Windows Chrome via PowerShell with the ChromeCDP profile
Native Linux: Launches google-chrome/chromium with ~/.config/agent-browser-chrome profile

One-time sign-in: ChromeCDP is a dedicated automation profile (required by Chrome 136+ security — the default profile cannot use CDP). Sign in to Google/Stripe/etc. once in the CDP Chrome window. Cookies persist across all future sessions.

CRITICAL RULES:

NEVER launch Chrome manually (google-chrome, chromium, chrome.exe). Always use start-chrome-debug.
NEVER launch a separate browser instance. agent-browser manages its own CDP connection.
NEVER run agent-browser connect manually — start-chrome-debug handles this.
If start-chrome-debug reports an error, STOP and tell the user.

Environment (Pre-configured)

Environment variables in .bashrc — do not modify:

AGENT_BROWSER_HEADED=1 — visible Chrome window, user can interject anytime
AGENT_BROWSER_AUTO_CONNECT=1 — auto-discovers running Chrome CDP
AGENT_BROWSER_SESSION="claude-${PPID}" — each Claude session gets its own isolated daemon/tab
AGENT_BROWSER_ARGS — anti-bot-detection flags

The persistent Chrome profile has saved cookies. After running start-chrome-debug, no login needed.

Phase 0: Research the UI (MANDATORY — No Exceptions)

Before opening ANY website with agent-browser open, you MUST research the current UI first. Your training data is stale. UIs change constantly. Research first, click second.

WebSearch the current UI flow — search for "[site name] [task] steps [current year]" or "[site name] UI layout [current year]"
- Example: "Google Ads create conversion action steps 2026"
- Example: "Stripe connect webhook endpoint setup 2026"
- Example: "Amazon order flow current layout 2026"
- Example: "threads.com compose new post UI 2026"
Document the expected navigation path before opening the browser:
- Where target buttons/links/forms are in the CURRENT UI
- What the current navigation path looks like
- Any recent UI redesigns or layout changes
- What form fields to fill and with what values
- What confirmation screens to expect
Then execute using the Core Workflow below, following the researched path step-by-step

NO EXCEPTIONS. Not for "simple" sites. Not for "your own" sites. Not for sites you "already know." Your knowledge is stale. This is a hard gate — skip it and you WILL brute-force through wrong clicks and waste tokens.

Core Workflow

# Navigate + wait + get element refs
agent-browser open URL && agent-browser wait --load networkidle && agent-browser snapshot -i --compact

# Interact using @refs from snapshot
agent-browser click @e5
agent-browser fill @e3 "text"

# Re-snapshot after DOM changes (refs become stale)
agent-browser snapshot -i --compact

Chain commands with && for speed. Use separate calls when you need to parse output before next step.

Session Isolation (Automatic)

Each Claude Code session gets its own isolated Chrome tab via AGENT_BROWSER_SESSION="claude-${PPID}". Multiple sessions NEVER share tabs — each daemon creates a fresh target on connect. No configuration needed.

Parallel Verification (Multi-Tab)

Use tab new to open multiple pages simultaneously within one session:

# Open multiple tabs for parallel checks
agent-browser tab new https://site.com/page1 && agent-browser tab new https://site.com/page2

# List all tabs (shows index numbers)
agent-browser tab list

# Switch to tab by index, then screenshot/snapshot
agent-browser tab 0 && agent-browser screenshot page1.png
agent-browser tab 1 && agent-browser screenshot page2.png

# Clean up when done
agent-browser tab close 1 && agent-browser tab close 0

Or use ab-parallel for bulk checks:

ab-parallel check https://site.com/page1 https://site.com/page2

When to use tabs vs sequential open:

Sequential open: Same tab, navigating through a flow (login → dashboard → settings)
tab new: Parallel verification — checking multiple independent pages without losing state

Testing / Verification Workflow

When user says "test" or "verify" a feature:

Act as a real user — click, type, fill forms (not programmatic tests)
Trigger the action — submit form, click button, complete flow
Verify downstream effects:
- Check the database (SSH + SQL query)
- Check other pages where the change should appear
- Use ab-parallel for multi-page checks
Take screenshots as evidence at each step
Check console — agent-browser errors must be clean

Example: fill checkout email + submit -> verify Purchase row in DB -> verify success page -> verify admin dashboard updated.

Gate Requirements

The hook system tracks agent-browser calls. Gate clears when ALL met:

Interacted (click/type/fill — not just navigate)
Visited 2+ pages
agent-browser errors returned clean
agent-browser screenshot taken

Key Commands

Command	Purpose
`open <url>`	Navigate
`snapshot -i --compact`	Accessibility tree with @refs, compact (fewer tokens)
`click @e1`	Click element
`fill @e1 "text"`	Clear + type
`type @e1 "text"`	Append text
`screenshot --format jpeg --quality 80`	Capture page (JPEG = 3-5x smaller than PNG)
`errors`	Check console errors
`get text @e1`	Extract text
`get url`	Current URL
`eval <js>`	Run JavaScript
`wait --load networkidle`	Wait for page load
`tab new [url]`	Open new tab (optionally navigate)
`tab list`	List all tabs with index numbers
`tab <n>`	Switch to tab by index
`tab close [n]`	Close tab (current or by index)

Multi-Agent Coordination

Use ab-tasks for inter-session coordination (no daemon needed — pure file I/O):

Coordinator + Workers Pattern

CRITICAL: Each worker MUST use its own session. Without this, all workers fight over one tab.

# Coordinator creates tasks
ab-tasks create "Check Amazon seller rating for WidgetCo"
ab-tasks create "Message Alibaba supplier about bulk pricing"
ab-tasks create "Scrape competitor pricing on eBay"

Each worker agent must set a unique session before any browser commands:

# Worker sets unique session (gets its own independent Chrome tab)
export AGENT_BROWSER_SESSION="worker-1"
ab-tasks claim                                    # Claims next pending task
agent-browser open <url-from-task>                # Own tab, no conflicts
agent-browser snapshot -i --compact
# ... do the work ...
ab-tasks complete <id> "result data"              # Mark done with result
ab-tasks share worker1_finding "key insight"       # Share with other workers
agent-browser close                                # Clean up own session

Workers run in TRUE parallel — each has its own daemon, its own Chrome tab, zero interference.

Fast Execution: `ab-workers`

For mechanical tasks (open URL, get data, report back), use ab-workers instead of AI agents — 3x faster, zero AI overhead:

# 1. Create tasks
ab-tasks create "Get title from https://example.com/page1"
ab-tasks create "Get title from https://example.com/page2"
ab-tasks create "Get title from https://example.com/page3"

# 2. Execute all in parallel (5 workers default)
ab-workers           # Claims all pending tasks, runs in parallel bash workers
ab-workers 10        # Use up to 10 parallel workers

ab-workers auto-claims tasks, opens each URL in its own session/tab, gets the title, completes the task, and shares results. Use AI agents only when the task needs reasoning (form filling, navigation decisions, data interpretation).

# Coordinator checks progress and collects results
ab-tasks list                                      # See all tasks + status
ab-tasks shared                                    # Read results from all workers

Shared State

ab-tasks share <key> <value>    # Write (scoped to AGENT_BROWSER_SESSION)
ab-tasks shared [key]           # Read across all sessions

agent-browser