interact-with-browser
BE SURE TO CLEAN UP SCREEN SHOTS AFTER YOU ARE DONE WITH EVERYTHING
IF THIS NEEDS TO BE INSTALLED
npm install -g agent-browser
agent-browser install # to get chromium downloaded
agent-browser open example.com agent-browser snapshot # Get accessibility tree with refs agent-browser click @e2 # Click by ref from snapshot agent-browser fill @e3 "test@example.com" # Fill by ref agent-browser get text @e1 # Get text by ref agent-browser screenshot page.png agent-browser close
Traditional Selectors (also supported)
agent-browser click "#submit" agent-browser fill "#email" "test@example.com" agent-browser find role button click --name "Submit"
Commands Core Commands
agent-browser open # Navigate to URL (aliases: goto, navigate) agent-browser click # Click element agent-browser dblclick # Double-click element agent-browser focus # Focus element agent-browser type # Type into element agent-browser fill # Clear and fill agent-browser press # Press key (Enter, Tab, Control+a) (alias: key) agent-browser keydown # Hold key down agent-browser keyup # Release key agent-browser hover # Hover element agent-browser select # Select dropdown option agent-browser check # Check checkbox agent-browser uncheck # Uncheck checkbox agent-browser scroll [px] # Scroll (up/down/left/right) agent-browser scrollintoview # Scroll element into view (alias: scrollinto) agent-browser drag # Drag and drop agent-browser upload # Upload files agent-browser screenshot [path] # Take screenshot (--full for full page) agent-browser pdf # Save as PDF agent-browser snapshot # Accessibility tree with refs (best for AI) agent-browser eval # Run JavaScript agent-browser close # Close browser (aliases: quit, exit)
Get Info
agent-browser get text # Get text content agent-browser get html # Get innerHTML agent-browser get value # Get input value agent-browser get attr # Get attribute agent-browser get title # Get page title agent-browser get url # Get current URL agent-browser get count # Count matching elements agent-browser get box # Get bounding box
Check State
agent-browser is visible # Check if visible agent-browser is enabled # Check if enabled agent-browser is checked # Check if checked
Find Elements (Semantic Locators)
agent-browser find role [value] # By ARIA role agent-browser find text # By text content agent-browser find label [value] # By label agent-browser find placeholder [value] # By placeholder agent-browser find alt # By alt text agent-browser find title # By title attr agent-browser find testid [value] # By data-testid agent-browser find first [value] # First match agent-browser find last [value] # Last match agent-browser find nth [value] # Nth match
Actions: click, fill, check, hover, text
Examples:
agent-browser find role button click --name "Submit" agent-browser find text "Sign In" click agent-browser find label "Email" fill "test@test.com" agent-browser find first ".item" click agent-browser find nth 2 "a" text
Wait
agent-browser wait # Wait for element to be visible agent-browser wait # Wait for time (milliseconds) agent-browser wait --text "Welcome" # Wait for text to appear agent-browser wait --url "**/dash" # Wait for URL pattern agent-browser wait --load networkidle # Wait for load state agent-browser wait --fn "window.ready === true" # Wait for JS condition
Load states: load, domcontentloaded, networkidle Mouse Control
agent-browser mouse move # Move mouse agent-browser mouse down [button] # Press button (left/right/middle) agent-browser mouse up [button] # Release button agent-browser mouse wheel [dx] # Scroll wheel
Browser Settings
agent-browser set viewport # Set viewport size agent-browser set device # Emulate device ("iPhone 14") agent-browser set geo # Set geolocation agent-browser set offline [on|off] # Toggle offline mode agent-browser set headers # Extra HTTP headers agent-browser set credentials # HTTP basic auth agent-browser set media [dark|light] # Emulate color scheme
Cookies & Storage
agent-browser cookies # Get all cookies agent-browser cookies set # Set cookie agent-browser cookies clear # Clear cookies
agent-browser storage local # Get all localStorage agent-browser storage local # Get specific key agent-browser storage local set # Set value agent-browser storage local clear # Clear all
agent-browser storage session # Same for sessionStorage
Network
agent-browser network route # Intercept requests agent-browser network route --abort # Block requests agent-browser network route --body # Mock response agent-browser network unroute [url] # Remove routes agent-browser network requests # View tracked requests agent-browser network requests --filter api # Filter requests
Tabs & Windows
agent-browser tab # List tabs agent-browser tab new [url] # New tab (optionally with URL) agent-browser tab # Switch to tab n agent-browser tab close [n] # Close tab agent-browser window new # New window
Frames
agent-browser frame # Switch to iframe agent-browser frame main # Back to main frame
Dialogs
agent-browser dialog accept [text] # Accept (with optional prompt text) agent-browser dialog dismiss # Dismiss
Debug
agent-browser trace start [path] # Start recording trace agent-browser trace stop [path] # Stop and save trace agent-browser console # View console messages agent-browser console --clear # Clear console agent-browser errors # View page errors agent-browser errors --clear # Clear errors agent-browser highlight # Highlight element agent-browser state save # Save auth state agent-browser state load # Load auth state
Navigation
agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload page
Setup
agent-browser install # Download Chromium browser agent-browser install --with-deps # Also install system deps (Linux)
Sessions
Run multiple isolated browser instances:
Different sessions
agent-browser --session agent1 open site-a.com agent-browser --session agent2 open site-b.com
Or via environment variable
AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
List active sessions
agent-browser session list
Output:
Active sessions:
-> default
agent1
Show current session
agent-browser session
Each session has its own:
Browser instance
Cookies and storage
Navigation history
Authentication state
Snapshot Options
The snapshot command supports filtering to reduce output size:
agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive elements only (buttons, inputs, links) agent-browser snapshot -c # Compact (remove empty structural elements) agent-browser snapshot -d 3 # Limit depth to 3 levels agent-browser snapshot -s "#main" # Scope to CSS selector agent-browser snapshot -i -c -d 5 # Combine options
Option Description -i, --interactive Only show interactive elements (buttons, links, inputs) -c, --compact Remove empty structural elements -d, --depth Limit tree depth -s, --selector Scope to CSS selector Options Option Description --session Use isolated session (or AGENT_BROWSER_SESSION env) --headers Set HTTP headers scoped to the URL's origin --executable-path Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATH env) --json JSON output (for agents) --full, -f Full page screenshot --name, -n Locator name filter --exact Exact text match --headed Show browser window (not headless) --cdp Connect via Chrome DevTools Protocol --debug Debug output Selectors Refs (Recommended for AI)
Refs provide deterministic element selection from snapshots:
1. Get snapshot with refs
agent-browser snapshot
Output:
- heading "Example Domain" [ref=e1] [level=1]
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]
- link "Learn more" [ref=e4]
2. Use refs to interact
agent-browser click @e2 # Click the button agent-browser fill @e3 "test@example.com" # Fill the textbox agent-browser get text @e1 # Get heading text agent-browser hover @e4 # Hover the link
Why use refs?
Deterministic: Ref points to exact element from snapshot
Fast: No DOM re-query needed
AI-friendly: Snapshot + ref workflow is optimal for LLMs
CSS Selectors
agent-browser click "#id" agent-browser click ".class" agent-browser click "div > button"
Text & XPath
agent-browser click "text=Submit" agent-browser click "xpath=//button"
Semantic Locators
agent-browser find role button click --name "Submit" agent-browser find label "Email" fill "test@test.com"
Agent Mode
Use --json for machine-readable output:
agent-browser snapshot --json
Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
agent-browser get text @e1 --json agent-browser is visible @e2 --json
Optimal AI Workflow
1. Navigate and get snapshot
agent-browser open example.com agent-browser snapshot -i --json # AI parses tree and refs
2. AI identifies target refs from snapshot
3. Execute actions using refs
agent-browser click @e2 agent-browser fill @e3 "input text"
4. Get new snapshot if page changed
agent-browser snapshot -i --json
Headed Mode
Show the browser window for debugging:
agent-browser open example.com --headed
This opens a visible browser window instead of running headless. Authenticated Sessions
Use --headers to set HTTP headers for a specific origin, enabling authentication without login flows:
Headers are scoped to api.example.com only
agent-browser open api.example.com --headers '{"Authorization": "Bearer "}'
Requests to api.example.com include the auth header
agent-browser snapshot -i --json agent-browser click @e2
Navigate to another domain - headers are NOT sent (safe!)
agent-browser open other-site.com
This is useful for:
Skipping login flows - Authenticate via headers instead of UI
Switching users - Start new sessions with different auth tokens
API testing - Access protected endpoints directly
Security - Headers are scoped to the origin, not leaked to other domains
To set headers for multiple origins, use --headers with each open command:
agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}' agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
For global headers (all domains), use set headers:
agent-browser set headers '{"X-Custom-Header": "value"}'
Custom Browser Executable
Use a custom browser executable instead of the bundled Chromium. This is useful for:
Serverless deployment: Use lightweight Chromium builds like @sparticuz/chromium (~50MB vs ~684MB)
System browsers: Use an existing Chrome/Chromium installation
Custom builds: Use modified browser builds
CLI Usage
Via flag
agent-browser --executable-path /path/to/chromium open example.com
Via environment variable
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
Serverless Example (Vercel/AWS Lambda)
import chromium from '@sparticuz/chromium'; import { BrowserManager } from 'agent-browser';
export async function handler() { const browser = new BrowserManager(); await browser.launch({ executablePath: await chromium.executablePath(), headless: true, }); // ... use browser }
$ npx skills add richardanaya/agent-skills --skill "interact-with-browser"