agent-browser

SKILL.md

agent-browser CLI

A headless browser automation CLI designed for AI agents, with fast Rust-based execution and element refs optimized for LLM reasoning.

Overview

agent-browser provides programmatic browser control through a CLI that's purpose-built for AI agent workflows. It uses deterministic element references (@e1, @e2, etc.) from accessibility trees, making it ideal for LLM-based automation where consistent element targeting is critical.

Key Features:

  • Fast Rust-based CLI with Node.js fallback
  • Element refs (@e1, @e2) for stable LLM reasoning
  • Session isolation for parallel browser instances
  • JSON output for programmatic parsing
  • Accessibility tree snapshots for element discovery
  • CDP connection support for existing browser instances

When to Use

  • Automating web interactions (form filling, clicking, navigation)
  • Scraping web content with accessibility tree parsing
  • Testing web applications programmatically
  • Multi-agent scenarios requiring isolated browser sessions
  • Screenshot capture and PDF generation
  • Monitoring web page state changes

Prerequisites

  • Node.js >= 18 or Bun runtime
  • Chromium (installed via agent-browser install)

Installation

# Install globally
bun install -g agent-browser

# Download Chromium
agent-browser install

# Linux with system dependencies
agent-browser install --with-deps

Verify installation:

agent-browser --version

Quick Start

# Navigate to a page
agent-browser open https://example.com

# Get accessibility snapshot with element refs
agent-browser snapshot -i  # -i = interactive elements only

# Click an element by ref
agent-browser click @e2

# Fill a form field
agent-browser fill @e3 "test@example.com"

# Take a screenshot
agent-browser screenshot output.png

Core Workflow

1. Open Page and Snapshot

# Open URL
agent-browser open https://example.com

# Get interactive elements with refs
agent-browser snapshot -i --json

The snapshot returns elements like:

@e1 link "Home"
@e2 textbox "Email"
@e3 button "Submit"

2. Interact Using Refs

# Click by ref
agent-browser click @e2

# Type into focused element
agent-browser type @e2 "hello@example.com"

# Fill (clears first, then types)
agent-browser fill @e2 "hello@example.com"

# Press keyboard key
agent-browser press Enter

3. Verify State

# Check element visibility
agent-browser is visible @e3

# Get element text
agent-browser get text @e1

# Get page URL
agent-browser get url

Command Reference

Navigation

Command Description
open <url> Navigate to URL
back Go back in history
forward Go forward in history
reload Reload current page
close Close browser

Interactions

Command Description
click <ref> Click element
dblclick <ref> Double-click element
type <ref> <text> Type text into element
fill <ref> <text> Clear and fill element
press <key> Press keyboard key (Enter, Tab, Control+a)
hover <ref> Hover over element
focus <ref> Focus element
check <ref> Check checkbox
uncheck <ref> Uncheck checkbox
select <ref> <val> Select dropdown option
scroll <dir> [px] Scroll (up/down/left/right)
wait <ref|ms> Wait for element or milliseconds

Information

Command Description
snapshot Get accessibility tree with refs
snapshot -i Interactive elements only
snapshot -c Compact (remove empty elements)
snapshot -d <n> Limit tree depth
get text <ref> Get element text content
get html <ref> Get element HTML
get value <ref> Get input value
get url Get current page URL
get title Get page title
screenshot [path] Take screenshot
screenshot --full Full page screenshot
pdf <path> Save page as PDF

State Checks

Command Description
is visible <ref> Check if element is visible
is enabled <ref> Check if element is enabled
is checked <ref> Check if checkbox is checked

Find Elements

# Find by role and click
agent-browser find role button click --name Submit

# Find by text
agent-browser find text "Sign In" click

# Find by label
agent-browser find label "Email" fill "test@example.com"

# Find by placeholder
agent-browser find placeholder "Search..." type "query"

Session Management

Sessions provide isolated browser instances for parallel execution:

# Use named session
agent-browser --session login-flow open https://app.com

# Different session for another task
agent-browser --session checkout open https://app.com/cart

# List active sessions
agent-browser session list

# Environment variable (persistent across commands)
export AGENT_BROWSER_SESSION=my-session
agent-browser open https://example.com

JSON Output

Use --json for machine-readable output:

# Snapshot as JSON
agent-browser snapshot -i --json

# Get element info as JSON
agent-browser get text @e1 --json

# Parse in scripts
agent-browser get url --json | jq -r '.url'

Advanced Features

CDP Connection

Connect to an existing Chrome instance:

# Start Chrome with debugging
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

# Connect agent-browser
agent-browser --cdp 9222 snapshot

Video Recording

# Start recording
agent-browser record start recording.webm https://example.com

# Perform actions...
agent-browser click @e1
agent-browser fill @e2 "test"

# Stop and save
agent-browser record stop

Network Interception

# Block requests to URL pattern
agent-browser network route "**/analytics/*" --abort

# Mock API response
agent-browser network route "**/api/user" --body '{"name": "Test"}'

# View captured requests
agent-browser network requests

Custom Headers

# Set auth headers
agent-browser --headers '{"Authorization": "Bearer token123"}' open https://api.example.com

Browser Extensions

# Load extension
agent-browser --extension /path/to/extension open https://example.com

AI Agent Patterns

Form Automation

#!/bin/bash
# Login automation script

agent-browser open https://app.com/login

# Get form elements
agent-browser snapshot -i --json > elements.json

# Fill credentials
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"

# Submit
agent-browser click @e4

# Wait for navigation
agent-browser wait 2000

# Verify login
agent-browser get url --json | jq -r '.url'

Data Extraction

#!/bin/bash
# Scrape product data

agent-browser open https://shop.com/products

# Get page content
agent-browser snapshot --json > snapshot.json

# Extract specific element text
agent-browser get text '[data-testid="price"]' --json

# Screenshot for verification
agent-browser screenshot products.png

Multi-Page Workflow

#!/bin/bash
SESSION="checkout-flow"

# Step 1: Add to cart
agent-browser --session $SESSION open https://shop.com/product/123
agent-browser --session $SESSION click @add-to-cart-button
agent-browser --session $SESSION wait 1000

# Step 2: Go to checkout
agent-browser --session $SESSION open https://shop.com/checkout
agent-browser --session $SESSION snapshot -i

# Step 3: Fill shipping
agent-browser --session $SESSION fill @shipping-name "John Doe"
agent-browser --session $SESSION fill @shipping-address "123 Main St"

# Cleanup
agent-browser --session $SESSION close

Options Reference

Option Description
--session <name> Isolated browser session
--json JSON output format
--headed Show browser window (not headless)
--cdp <port> Connect via Chrome DevTools Protocol
--headers <json> HTTP headers for requests
--proxy <url> Proxy server
--executable-path <path> Custom browser executable
--extension <path> Load browser extension
--full, -f Full page screenshot
--debug Debug output

Environment Variables

Variable Description
AGENT_BROWSER_SESSION Default session name
AGENT_BROWSER_EXECUTABLE_PATH Custom browser path
AGENT_BROWSER_STREAM_PORT WebSocket streaming port

Troubleshooting

"Browser not found"

# Install Chromium
agent-browser install

Element ref not found

# Refresh snapshot after page changes
agent-browser snapshot -i

# Check if element is visible
agent-browser is visible @e1

Session conflicts

# List active sessions
agent-browser session list

# Use unique session names
agent-browser --session unique-$(date +%s) open https://example.com

Headless issues

# Run with visible browser for debugging
agent-browser --headed open https://example.com

Timeout waiting for element

# Explicit wait before interaction
agent-browser wait 2000
agent-browser wait @e1  # Wait for element to appear

Best Practices

  • Always snapshot first: Get element refs before interacting
  • Use --json for parsing: Machine-readable output for scripts
  • Session isolation: Use --session for parallel workflows
  • Explicit waits: Add wait commands after navigation/clicks
  • Interactive snapshots: Use -i flag to reduce noise
  • Verify state: Check element visibility before clicking
  • Screenshot on failure: Capture state when automation fails
Weekly Installs
5
GitHub Stars
1
First Seen
9 days ago
Installed on
gemini-cli5
github-copilot5
codex5
kimi-cli5
cursor5
amp5