Agent Browser Automation

Guide for using agent-browser CLI to automate web browsing tasks in Claude Code.

Quick Start

Installation Check

Before using agent-browser, verify installation:

# Check if installed
agent-browser --version

# If not installed, install globally
npm install -g agent-browser
agent-browser install  # Download Chromium

Windows Note: If you encounter /bin/sh errors on Windows, use:

npx agent-browser <command>

See troubleshooting.md for platform-specific issues.

Core Workflow

agent-browser uses a refs-based system where page elements get unique identifiers (like @e1, @e2) that you can use for interactions.

Basic Pattern

Open a page
Get snapshot with refs
Interact using refs
Repeat as needed

# 1. Navigate to page
agent-browser open example.com

# 2. Get page snapshot with interactive elements
agent-browser snapshot -i --json

# 3. Use refs from snapshot to interact
agent-browser click @e5
agent-browser fill @e3 "search query"

# 4. Take screenshot or get results
agent-browser screenshot result.png

Essential Commands

Navigation

agent-browser open <url>           # Open URL
agent-browser goto <url>           # Navigate to URL
agent-browser back                 # Go back
agent-browser forward              # Go forward
agent-browser reload               # Reload page

Getting Page Information

agent-browser snapshot             # Get accessibility tree
agent-browser snapshot -i          # Interactive elements only
agent-browser snapshot -i --json   # JSON format (best for AI)
agent-browser screenshot <file>    # Take screenshot
agent-browser get text @e1         # Get element text
agent-browser get html             # Get page HTML
agent-browser get url              # Get current URL

Interacting with Elements

agent-browser click @e2            # Click element by ref
agent-browser dblclick @e2         # Double click
agent-browser fill @e3 "text"      # Fill input field
agent-browser type @e3 "text"      # Type text (slower, more realistic)
agent-browser press Enter          # Press keyboard key
agent-browser check @e4            # Check checkbox
agent-browser select @e5 "option"  # Select dropdown option
agent-browser upload @e6 file.pdf  # Upload file

Semantic Locators (Find Commands)

When you don't have refs, use semantic locators:

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@example.com"
agent-browser find placeholder "Search..." type "query"

Waiting

agent-browser wait @e1             # Wait for element
agent-browser wait --text "Done"   # Wait for text
agent-browser wait --url /success  # Wait for URL change
agent-browser wait --load          # Wait for page load

Session Management

Use sessions to run multiple isolated browser instances:

# Start different sessions
agent-browser --session task1 open site-a.com
agent-browser --session task2 open site-b.com

# Each session has separate:
# - Cookies and storage
# - Authentication state
# - Navigation history

# List active sessions
agent-browser session list

# Close specific session
agent-browser --session task1 close

AI Agent Workflow

For AI-driven automation, follow this pattern:

Navigate and snapshot

agent-browser open https://example.com
agent-browser snapshot -i --json > page.json

Parse JSON to understand page structure
- Identify interactive elements and their refs
- Understand page layout and available actions
Execute actions using refs

agent-browser click @e2
agent-browser fill @e5 "input data"

Get new snapshot after page changes

agent-browser snapshot -i --json > updated.json

Repeat until task complete

See workflows.md for detailed AI workflow patterns.

Advanced Features

Network Interception

# Block requests
agent-browser route --block "*.ads.com/*"

# Mock responses
agent-browser route --mock "/api/data" response.json

State Persistence

# Save authentication state
agent-browser save-state auth.json

# Load state in new session
agent-browser load-state auth.json

Debugging

# Enable console logs
agent-browser --console open example.com

# Highlight elements
agent-browser highlight @e3

# Enable tracing
agent-browser --trace trace.zip open example.com

Best Practices

Use -i --json for snapshots - Reduces noise, easier for AI to parse
Prefer refs over selectors - More reliable than CSS/XPath
Use sessions for parallel tasks - Isolate different workflows
Wait for elements - Use wait commands to handle dynamic content
Take screenshots - Visual confirmation of state
Use semantic locators as fallback - When refs aren't available

Common Patterns

Form Filling

agent-browser open https://form.example.com
agent-browser snapshot -i --json
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser click @e3  # Submit button
agent-browser wait --url /success

Data Extraction

agent-browser open https://data.example.com
agent-browser snapshot -i --json > structure.json
agent-browser get text @e5 > data.txt
agent-browser screenshot evidence.png

Multi-Step Workflow

# Login
agent-browser open https://app.example.com/login
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3

# Navigate to target
agent-browser wait --url /dashboard
agent-browser goto https://app.example.com/data

# Extract data
agent-browser snapshot -i --json > results.json

Platform-Specific Notes

Windows

Use npx agent-browser if global command fails
PowerShell may require quotes around URLs with special characters
See troubleshooting.md for /bin/sh errors

Linux

Install with dependencies: agent-browser install --with-deps
May need to install Playwright system dependencies manually

macOS

Works out of the box after npm install -g agent-browser

Reference Documentation

commands.md - Complete command reference
workflows.md - AI workflow patterns and examples
troubleshooting.md - Common issues and solutions

Architecture

agent-browser is built on Playwright with:

Fast Rust CLI implementation (with Node.js fallback)
Accessibility tree parsing for AI-friendly page representation
Reference system (@e1, @e2) for stable element targeting
Chrome DevTools Protocol (CDP) for persistent sessions

When to Use agent-browser

✅ Use agent-browser when:

Automating web browsing tasks
Scraping data from websites
Filling and submitting forms
Testing web applications
Interacting with dynamic web pages
Need AI-friendly element targeting

❌ Don't use agent-browser when:

Simple HTTP requests suffice (use curl/fetch instead)
API endpoints are available (use API directly)
Task doesn't require browser rendering