agent-browser
agent-browser CLI
A headless browser automation CLI designed for AI agents, with fast Rust-based execution and element refs optimized for LLM reasoning.
Overview
agent-browser provides programmatic browser control through a CLI that's purpose-built for AI agent workflows. It uses deterministic element references (@e1, @e2, etc.) from accessibility trees, making it ideal for LLM-based automation where consistent element targeting is critical.
Key Features:
- Fast Rust-based CLI with Node.js fallback
- Element refs (
@e1,@e2) for stable LLM reasoning - Session isolation for parallel browser instances
- JSON output for programmatic parsing
- Accessibility tree snapshots for element discovery
- CDP connection support for existing browser instances
When to Use
- Automating web interactions (form filling, clicking, navigation)
- Scraping web content with accessibility tree parsing
- Testing web applications programmatically
- Multi-agent scenarios requiring isolated browser sessions
- Screenshot capture and PDF generation
- Monitoring web page state changes
Prerequisites
- Node.js >= 18 or Bun runtime
- Chromium (installed via
agent-browser install)
Installation
# Install globally
bun install -g agent-browser
# Download Chromium
agent-browser install
# Linux with system dependencies
agent-browser install --with-deps
Verify installation:
agent-browser --version
Quick Start
# Navigate to a page
agent-browser open https://example.com
# Get accessibility snapshot with element refs
agent-browser snapshot -i # -i = interactive elements only
# Click an element by ref
agent-browser click @e2
# Fill a form field
agent-browser fill @e3 "test@example.com"
# Take a screenshot
agent-browser screenshot output.png
Core Workflow
1. Open Page and Snapshot
# Open URL
agent-browser open https://example.com
# Get interactive elements with refs
agent-browser snapshot -i --json
The snapshot returns elements like:
@e1 link "Home"
@e2 textbox "Email"
@e3 button "Submit"
2. Interact Using Refs
# Click by ref
agent-browser click @e2
# Type into focused element
agent-browser type @e2 "hello@example.com"
# Fill (clears first, then types)
agent-browser fill @e2 "hello@example.com"
# Press keyboard key
agent-browser press Enter
3. Verify State
# Check element visibility
agent-browser is visible @e3
# Get element text
agent-browser get text @e1
# Get page URL
agent-browser get url
Command Reference
Navigation
| Command | Description |
|---|---|
open <url> |
Navigate to URL |
back |
Go back in history |
forward |
Go forward in history |
reload |
Reload current page |
close |
Close browser |
Interactions
| Command | Description |
|---|---|
click <ref> |
Click element |
dblclick <ref> |
Double-click element |
type <ref> <text> |
Type text into element |
fill <ref> <text> |
Clear and fill element |
press <key> |
Press keyboard key (Enter, Tab, Control+a) |
hover <ref> |
Hover over element |
focus <ref> |
Focus element |
check <ref> |
Check checkbox |
uncheck <ref> |
Uncheck checkbox |
select <ref> <val> |
Select dropdown option |
scroll <dir> [px] |
Scroll (up/down/left/right) |
wait <ref|ms> |
Wait for element or milliseconds |
Information
| Command | Description |
|---|---|
snapshot |
Get accessibility tree with refs |
snapshot -i |
Interactive elements only |
snapshot -c |
Compact (remove empty elements) |
snapshot -d <n> |
Limit tree depth |
get text <ref> |
Get element text content |
get html <ref> |
Get element HTML |
get value <ref> |
Get input value |
get url |
Get current page URL |
get title |
Get page title |
screenshot [path] |
Take screenshot |
screenshot --full |
Full page screenshot |
pdf <path> |
Save page as PDF |
State Checks
| Command | Description |
|---|---|
is visible <ref> |
Check if element is visible |
is enabled <ref> |
Check if element is enabled |
is checked <ref> |
Check if checkbox is checked |
Find Elements
# Find by role and click
agent-browser find role button click --name Submit
# Find by text
agent-browser find text "Sign In" click
# Find by label
agent-browser find label "Email" fill "test@example.com"
# Find by placeholder
agent-browser find placeholder "Search..." type "query"
Session Management
Sessions provide isolated browser instances for parallel execution:
# Use named session
agent-browser --session login-flow open https://app.com
# Different session for another task
agent-browser --session checkout open https://app.com/cart
# List active sessions
agent-browser session list
# Environment variable (persistent across commands)
export AGENT_BROWSER_SESSION=my-session
agent-browser open https://example.com
JSON Output
Use --json for machine-readable output:
# Snapshot as JSON
agent-browser snapshot -i --json
# Get element info as JSON
agent-browser get text @e1 --json
# Parse in scripts
agent-browser get url --json | jq -r '.url'
Common Pitfalls
This section covers frequent failure modes and how to debug them. Each pitfall includes the symptom you'll see, the underlying cause, and the fix.
Browser not found
Symptom: Error: Browser executable not found at default path or agent-browser: command not found
Cause: Chromium isn't installed, or agent-browser CLI isn't on PATH
Fix:
# Install Chromium
agent-browser install
# For Linux, install system dependencies
agent-browser install --with-deps
# Verify installation
agent-browser --version
Element ref becomes stale after page change
Symptom: Error: Element @e2 not found even though you just used it, or clicks fail silently
Cause: Element refs change when the DOM updates (navigation, page reload, dynamic content). Refs are only valid for the current DOM snapshot.
Fix:
# Always re-snapshot after major page changes
agent-browser click @e1 # Click submit
agent-browser wait 2000 # Wait for navigation
agent-browser snapshot -i --json # Get fresh refs
agent-browser click @e1 # Use new ref (was @e1 before? Verify!)
Prevention: In scripts, capture fresh refs after each navigation or dynamic load.
Session conflicts and port binding
Symptom: Error: Session "default" already in use or Port 9222 already in use
Cause: Browser session is still running from a previous command or crashed script, preventing a new session from starting
Fix:
# List active sessions
agent-browser session list
# Kill a specific session
agent-browser session kill my-session
# Use unique session names to avoid conflicts
agent-browser --session upload-$(date +%s) open https://app.com
# Or set environment variable once
export AGENT_BROWSER_SESSION=task-$(date +%s)
agent-browser open https://example.com
agent-browser close
Clicks and interactions don't work on hidden elements
Symptom: agent-browser click @e5 succeeds but nothing happens, or element is invisible in screenshot
Cause: Element is outside viewport, hidden by CSS (display: none, visibility: hidden), or covered by another element. Accessibility tree may show it, but browser can't interact with it.
Fix:
# Check visibility before interaction
agent-browser is visible @e5
# Scroll to make element visible
agent-browser scroll down 500
# Take screenshot to visually verify
agent-browser screenshot current-state.png
# Try parent element if child is deeply nested
agent-browser click @e4 # Click parent button instead
Prevention: Always take screenshots before and after critical interactions to verify visual state.
Timeout waiting for navigation or element
Symptom: Error: Timeout waiting for element or script hangs indefinitely after clicking a link
Cause:
- Element doesn't exist or appears after longer than expected delay
- Navigation is stuck (network error, page not loading)
- Element selector is wrong
Fix:
# Explicit wait before checking for element
agent-browser wait 2000 # Wait 2 seconds
agent-browser snapshot -i --json # Check if element exists
# Wait for specific element to appear
agent-browser wait @e3
# Increase timeout with explicit waits in script
agent-browser click @submit
sleep 3 # Shell script sleep
agent-browser snapshot -i
# Check page title to verify navigation worked
agent-browser get title
Prevention: Add explicit wait commands after form submissions or link clicks that cause navigation.
Snapshot is empty or missing expected elements
Symptom: agent-browser snapshot -i --json returns [] or very few elements, but you see them visually in screenshot
Cause:
- Page content is rendered by JavaScript (React, Vue, etc.) that hasn't completed
- Elements lack accessible labels or roles
-iflag filters too aggressively
Fix:
# Wait for content to render
agent-browser wait 2000
agent-browser snapshot -i
# Remove interactive-only filter to see all elements
agent-browser snapshot -c # Compact but includes all elements
# Increase tree depth to find nested elements
agent-browser snapshot -d 5
# Use full snapshot to diagnose
agent-browser snapshot --json | jq . | less
# Check page title to verify page loaded
agent-browser get title
Prevention: Add wait delays after opening pages with heavy JavaScript rendering.
Form fill doesn't work or text is partial
Symptom: Text appears to be cut off, only first few characters typed, or field stays empty
Cause:
- Field wasn't focused before typing (race condition)
- Field has input validation or character limit
- JavaScript event handlers expect specific typing speed or blur event
Fix:
# Always focus explicitly before filling
agent-browser focus @e2
agent-browser wait 500 # Let field settle
agent-browser fill @e2 "complete@example.com"
# For fields with slow JS event handling
agent-browser type @e2 "first part"
agent-browser wait 500
agent-browser type @e2 "second part"
# Verify what was actually entered
agent-browser get value @e2
Prevention: After any form fill, immediately verify the value was entered correctly with get value.
Network requests fail or external API timeouts
Symptom: Page loads but data doesn't populate, or integration tests fail when hitting real APIs
Cause:
- External API is slow or unreachable
- CORS errors (browser blocks requests to different domain)
- Missing authentication headers
- Network is isolated (testing environment)
Fix:
# Set custom headers for auth
agent-browser --headers '{"Authorization": "Bearer token123"}' open https://api.example.com
# Mock API responses instead
agent-browser network route "**/api/data" --body '{"result":"mocked"}'
# Add explicit wait for API response
agent-browser wait 3000 # Give API time to respond
agent-browser snapshot -i
# Fallback: Use `--headed` to see error messages
agent-browser --headed open https://app.com
agent-browser click @e1 # See browser console errors
Prevention: For testing, use network route to mock external APIs. For production, verify connectivity before running automation.
Browser crashes or becomes unresponsive
Symptom: agent-browser command hangs or process crashes with segmentation fault; Chrome window closes unexpectedly
Cause:
- Out of memory (too many long-running sessions or screenshots)
- Invalid CDP connection
- Rare Chromium/Playwright crash
- Insufficient system resources
Fix:
# Restart with fresh session
agent-browser session kill all-sessions # Or specify a name
# Use lightweight mode with limited resources
agent-browser --session lite open https://minimal-site.com
# Reduce screenshot frequency
# Instead of: agent-browser screenshot after every click
# Do: agent-browser screenshot only on failure
# Check system resources
top -n 1 | head -20
# Run with --debug for crash diagnostics
agent-browser --debug open https://example.com 2>&1 | tail -50
Prevention:
- Close sessions explicitly:
agent-browser close - Use unique session names for long-lived tasks
- Monitor memory in long automation scripts
Element ref is ambiguous (@e1 appears for multiple elements)
Symptom: Multiple elements resolve to the same @e number, clicking @e1 affects the wrong element
Cause: Accessibility tree groups similar elements (buttons, links) and agent-browser assigns refs based on order, not uniqueness
Fix:
# Use full snapshot to understand structure
agent-browser snapshot -d 3 --json > structure.json
cat structure.json | jq . # Examine hierarchy
# Click using find by text if available
agent-browser find text "Specific Button Text" click
# Use CSS selector if accessibility tree isn't clear
agent-browser find selector "button.submit-btn" click
# Take screenshot and count visually
agent-browser screenshot debug.png
# Then manually map which @eN corresponds to which button
# Use context to disambiguate
agent-browser get text @e1 # Check what @e1 actually is
Prevention: Always verify element identity by checking its text or taking screenshots before critical interactions.
Advanced Features
CDP Connection
Connect to an existing Chrome instance:
# Start Chrome with debugging
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
# Connect agent-browser
agent-browser --cdp 9222 snapshot
Video Recording
# Start recording
agent-browser record start recording.webm https://example.com
# Perform actions...
agent-browser click @e1
agent-browser fill @e2 "test"
# Stop and save
agent-browser record stop
Network Interception
# Block requests to URL pattern
agent-browser network route "**/analytics/*" --abort
# Mock API response
agent-browser network route "**/api/user" --body '{"name": "Test"}'
# View captured requests
agent-browser network requests
Custom Headers
# Set auth headers
agent-browser --headers '{"Authorization": "Bearer token123"}' open https://api.example.com
Browser Extensions
# Load extension
agent-browser --extension /path/to/extension open https://example.com
AI Agent Patterns
Form Automation
#!/bin/bash
# Login automation script
agent-browser open https://app.com/login
# Get form elements
agent-browser snapshot -i --json > elements.json
# Fill credentials
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"
# Submit
agent-browser click @e4
# Wait for navigation
agent-browser wait 2000
# Verify login
agent-browser get url --json | jq -r '.url'
Data Extraction
#!/bin/bash
# Scrape product data
agent-browser open https://shop.com/products
# Get page content
agent-browser snapshot --json > snapshot.json
# Extract specific element text
agent-browser get text '[data-testid="price"]' --json
# Screenshot for verification
agent-browser screenshot products.png
Multi-Page Workflow
#!/bin/bash
SESSION="checkout-flow"
# Step 1: Add to cart
agent-browser --session $SESSION open https://shop.com/product/123
agent-browser --session $SESSION click @add-to-cart-button
agent-browser --session $SESSION wait 1000
# Step 2: Go to checkout
agent-browser --session $SESSION open https://shop.com/checkout
agent-browser --session $SESSION snapshot -i
# Step 3: Fill shipping
agent-browser --session $SESSION fill @shipping-name "John Doe"
agent-browser --session $SESSION fill @shipping-address "123 Main St"
# Cleanup
agent-browser --session $SESSION close
Options Reference
| Option | Description |
|---|---|
--session <name> |
Isolated browser session |
--json |
JSON output format |
--headed |
Show browser window (not headless) |
--cdp <port> |
Connect via Chrome DevTools Protocol |
--headers <json> |
HTTP headers for requests |
--proxy <url> |
Proxy server |
--executable-path <path> |
Custom browser executable |
--extension <path> |
Load browser extension |
--full, -f |
Full page screenshot |
--debug |
Debug output |
Environment Variables
| Variable | Description |
|---|---|
AGENT_BROWSER_SESSION |
Default session name |
AGENT_BROWSER_EXECUTABLE_PATH |
Custom browser path |
AGENT_BROWSER_STREAM_PORT |
WebSocket streaming port |
Examples
Example 1: Simple Web Scraping
User request:
"Scrape the product title and price from https://example.com/product/123"
Workflow:
# 1. Navigate to page
agent-browser open https://example.com/product/123
# 2. Get accessibility snapshot
agent-browser snapshot -i --json > page.json
# 3. Identify elements (example output):
# @e5 heading "Premium Widget Pro"
# @e12 text "$49.99"
# 4. Extract data
agent-browser get text @e5 --json # Returns: {"text": "Premium Widget Pro"}
agent-browser get text @e12 --json # Returns: {"text": "$49.99"}
Expected output:
{
"title": "Premium Widget Pro",
"price": "$49.99"
}
Time: 2-3 seconds
Example 2: Form Submission with Verification
User request:
"Fill out the contact form at https://example.com/contact and verify submission"
Workflow:
# 1. Navigate and snapshot
agent-browser open https://example.com/contact
agent-browser snapshot -i
# Example snapshot output:
# @e1 textbox "Name"
# @e2 textbox "Email"
# @e3 textbox "Message"
# @e4 button "Send"
# 2. Fill form fields
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "I have a question about your product"
# 3. Submit
agent-browser click @e4
# 4. Wait for response
agent-browser wait 2000
# 5. Verify success
agent-browser snapshot -i | grep "Thank you"
# Or take screenshot for manual verification
agent-browser screenshot success.png
Expected outcome:
- Form submitted successfully
- Page shows confirmation message
- Screenshot saved showing success state
Time: 4-6 seconds
Example 3: Multi-Page Workflow with Session
User request:
"Add item to cart, proceed to checkout, and fill shipping info"
Workflow:
# Use session for state persistence across commands
SESSION="checkout-flow-$(date +%s)"
# 1. Add to cart
agent-browser --session $SESSION open https://shop.example.com/product/widget
agent-browser --session $SESSION snapshot -i | grep "Add to Cart"
agent-browser --session $SESSION click @add-to-cart # Assuming @add-to-cart ref found
agent-browser --session $SESSION wait 1000 # Wait for cart update
# 2. Go to checkout
agent-browser --session $SESSION open https://shop.example.com/checkout
agent-browser --session $SESSION snapshot -i
# 3. Fill shipping form (refs from snapshot)
agent-browser --session $SESSION fill @shipping-name "Jane Smith"
agent-browser --session $SESSION fill @shipping-address "456 Oak Ave"
agent-browser --session $SESSION fill @shipping-city "Portland"
agent-browser --session $SESSION select @shipping-state "OR"
agent-browser --session $SESSION fill @shipping-zip "97201"
# 4. Take screenshot before submission
agent-browser --session $SESSION screenshot checkout-filled.png
# 5. Cleanup
agent-browser --session $SESSION close
Expected outcome:
- Item added to cart successfully
- Checkout form filled with shipping details
- Screenshot saved showing completed form
- Session cleaned up
Time: 8-12 seconds
Example 4: Login and Data Extraction
User request:
"Log into dashboard, navigate to reports, and extract table data"
Workflow:
SESSION="dashboard-scrape"
# 1. Login
agent-browser --session $SESSION open https://app.example.com/login
agent-browser --session $SESSION snapshot -i
# Fill login form
agent-browser --session $SESSION fill @email "user@example.com"
agent-browser --session $SESSION fill @password "secretpass"
agent-browser --session $SESSION click @submit
agent-browser --session $SESSION wait 2000
# 2. Navigate to reports
agent-browser --session $SESSION open https://app.example.com/reports
agent-browser --session $SESSION wait 1000
# 3. Extract table data
agent-browser --session $SESSION snapshot --json > reports.json
# Parse JSON to extract table rows
# @e20 table
# @e21 row "Q1 2024, $50,000, 15% growth"
# @e22 row "Q2 2024, $62,000, 24% growth"
agent-browser --session $SESSION get text @e20 --json
# 4. Cleanup
agent-browser --session $SESSION close
Expected output:
{
"reports": [
{"quarter": "Q1 2024", "revenue": "$50,000", "growth": "15%"},
{"quarter": "Q2 2024", "revenue": "$62,000", "growth": "24%"}
]
}
Time: 6-10 seconds
Troubleshooting
Issue: Element Reference Not Found
Symptoms:
Error: Element @e5 not found
Cause: Element ref changed after page update or snapshot was stale
Solution:
# 1. Get fresh snapshot
agent-browser snapshot -i
# 2. Verify element ref in output
# Look for expected element in snapshot output
# 3. If element doesn't appear, try without -i flag
agent-browser snapshot # Shows all elements, not just interactive
# 4. If still missing, check if page loaded completely
agent-browser wait 2000 # Wait for page to settle
agent-browser snapshot -i
Issue: Timeout Waiting for Navigation
Symptoms:
Error: Navigation timeout after 30000ms
Cause: Page takes too long to load or network issues
Solution:
# 1. Check if page is accessible
agent-browser --headed open https://example.com # Visual debugging
# 2. Increase wait time after navigation
agent-browser open https://slow-site.com
agent-browser wait 5000 # Wait 5 seconds
# 3. Check network requests
agent-browser network requests # See what's loading
# 4. Use CDP connection for better control
agent-browser --cdp 9222 open https://example.com
Issue: Click Not Working
Symptoms: Element click command runs but nothing happens
Cause: Element not visible, disabled, or covered by another element
Solution:
# 1. Check if element is visible
agent-browser is visible @e3
# Returns: true/false
# 2. Check if element is enabled
agent-browser is enabled @e3
# 3. Try scrolling to element first
agent-browser scroll down 500
agent-browser wait 500
agent-browser click @e3
# 4. Take screenshot to see page state
agent-browser screenshot debug.png
# 5. Try double-click instead
agent-browser dblclick @e3
Issue: Form Input Not Registering
Symptoms: Text typed but form field remains empty
Cause: JavaScript framework requires special events or delays
Solution:
# 1. Focus element first
agent-browser focus @e2
agent-browser wait 200
# 2. Use fill instead of type (clears first)
agent-browser fill @e2 "text content"
# 3. Add delays between keystrokes
agent-browser type @e2 "slow"
agent-browser wait 100
# 4. Try pressing Tab after filling to trigger validation
agent-browser fill @e2 "email@example.com"
agent-browser press Tab
Issue: Session Persistence Problems
Symptoms: Previous session state not available
Cause: Session wasn't properly named or browser closed unexpectedly
Solution:
# 1. Always use explicit session names
SESSION="my-workflow-$(date +%s)" # Unique session ID
agent-browser --session $SESSION open https://example.com
# 2. Check active sessions
agent-browser --session $SESSION get url
# If error, session doesn't exist
# 3. Don't reuse session names across runs
# Each workflow should have unique session ID
# 4. Cleanup sessions when done
agent-browser --session $SESSION close
Issue: Chromium Installation Failed
Symptoms:
Error: Failed to download Chromium
Cause: Network issues, disk space, or missing system dependencies
Solution:
# 1. Check disk space
df -h # Ensure >2GB free
# 2. Retry installation with verbose output
agent-browser install --debug
# 3. On Linux, install system dependencies first
agent-browser install --with-deps
# 4. Use custom browser if needed
agent-browser --executable-path /usr/bin/chromium open https://example.com
# 5. Verify installation
agent-browser --version
ls ~/.cache/ms-playwright/ # Check if Chromium exists
Issue: JSON Output Malformed
Symptoms: Can't parse JSON output from commands
Cause: Mixed text/JSON output or errors in JSON mode
Solution:
# 1. Use --json flag consistently
agent-browser get text @e5 --json # Not: agent-browser get text @e5
# 2. Redirect stderr to separate stream
agent-browser snapshot --json 2>errors.log >output.json
# 3. Validate JSON output
agent-browser snapshot --json | jq . # Will error if invalid
# 4. Check for error messages in output
agent-browser snapshot --json | grep -E '^{' | jq .
Best Practices
- Always snapshot first: Get element refs before interacting
- Use
--jsonfor parsing: Machine-readable output for scripts - Session isolation: Use
--sessionfor parallel workflows - Explicit waits: Add
waitcommands after navigation/clicks - Interactive snapshots: Use
-iflag to reduce noise - Verify state: Check element visibility before clicking
- Screenshot on failure: Capture state when automation fails
More from ckorhonen/claude-skills
video-editor
Expert guidance for video editing with ffmpeg, encoding best practices, and quality optimization. Use when working with video files, transcoding, remuxing, encoding settings, color spaces, or troubleshooting video quality issues.
63tui-designer
Design and implement retro/cyberpunk/hacker-style terminal UIs. Covers React (Tuimorphic), SwiftUI (Metal shaders), and CSS approaches. Use when creating terminal aesthetics, CRT effects, neon glow, scanlines, phosphor green displays, or retro-futuristic interfaces.
35practical-typography
Professional typography guidance based on Matthew Butterick's Practical Typography. Use when evaluating, critiquing, or improving document formatting, text layout, font choices, punctuation, spacing, or any typography-related decisions for print or web content.
34app-marketing-copy
Write marketing copy and App Store / Google Play listings (ASO keywords, titles, subtitles, short+long descriptions, feature bullets, release notes), plus screenshot caption sets and text-to-image prompt templates for generating store screenshot backgrounds/promo visuals. Use when asked to: write/refresh app marketing copy, craft app store metadata, brainstorm taglines/value props, produce ad/landing/email copy, or generate prompts for screenshot/creative generation.
33markdown-fetch
Fetch and extract web content as clean Markdown when provided with URLs. Use this skill whenever a user provides a URL (http/https link) that needs to be read, analyzed, summarized, or extracted. Converts web pages to Markdown with 80% fewer tokens than raw HTML. Handles all content types including JS-heavy sites, documentation, articles, and blog posts. Supports three conversion methods (auto, AI, browser rendering). Always use this instead of web_fetch when working with URLs - it's more efficient and provides cleaner output.
26llm-advisor
Consult other LLMs (GPT-4.1, o4-mini, Gemini 2.5 Pro, Claude Opus) for second opinions on complex bugs, hard problems, planning, and architecture decisions. Use proactively when stuck for 15+ minutes or facing complex debugging. Use when user says 'ask Gemini/GPT/Claude about X' or 'get a second opinion'.
22