agent-browser
agent-browser Skill
Browser automation CLI designed for AI agents. Uses "snapshot + refs" paradigm for 93% less context than Playwright MCP.
Quick Start
# Install globally
npm install -g agent-browser
# Download Chromium (one-time)
agent-browser install
# Linux: include system deps
agent-browser install --with-deps
# Verify
agent-browser --version
Core Workflow
The 4-step pattern for all browser automation:
# 1. Navigate
agent-browser open https://example.com
# 2. Snapshot (get interactive elements with refs)
agent-browser snapshot -i
# Output: button "Sign In" @e1, textbox "Email" @e2, ...
# 3. Interact using refs
agent-browser fill @e2 "user@example.com"
agent-browser click @e1
# 4. Re-snapshot after page changes
agent-browser snapshot -i
When to Use (vs chrome-devtools)
| Use agent-browser | Use chrome-devtools |
|---|---|
| Long autonomous AI sessions | Quick one-off screenshots |
| Context-constrained workflows | Custom Puppeteer scripts needed |
| Video recording for debugging | WebSocket full frame debugging |
| Cloud browsers (Browserbase) | Existing workflow integration |
| Multi-tab handling | Need Sharp auto-compression |
| Self-verifying build loops | Session with auth injection |
Token efficiency: ~280 chars/snapshot vs 8K+ for Playwright MCP.
Command Reference
Navigation
agent-browser open <url> # Navigate to URL
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser close # Close browser
Analysis (Snapshot)
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -c # Compact output
agent-browser snapshot -d 3 # Limit depth
agent-browser snapshot -s "nav" # Scope to CSS selector
Interactions (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser dblclick @e1 # Double-click
agent-browser fill @e2 "text" # Clear and fill input
agent-browser type @e2 "text" # Type without clearing
agent-browser press Enter # Press key
agent-browser hover @e1 # Hover over element
agent-browser check @e3 # Check checkbox
agent-browser uncheck @e3 # Uncheck checkbox
agent-browser select @e4 "opt" # Select dropdown option
agent-browser scroll @e1 # Scroll element into view
agent-browser scroll down 500 # Scroll page by pixels
agent-browser drag @e1 @e2 # Drag from e1 to e2
agent-browser upload @e5 file.pdf # Upload file
Information Retrieval
agent-browser get text @e1 # Get text content
agent-browser get html @e1 # Get HTML
agent-browser get value @e2 # Get input value
agent-browser get attr @e1 href # Get attribute
agent-browser get title # Page title
agent-browser get url # Current URL
agent-browser get count "li" # Count elements
agent-browser get box @e1 # Bounding box
State Checks
agent-browser is visible @e1 # Check visibility
agent-browser is enabled @e1 # Check if enabled
agent-browser is checked @e3 # Check if checked
Media
agent-browser screenshot # Capture viewport
agent-browser screenshot --full # Full page
agent-browser screenshot -o ss.png # Save to file
agent-browser pdf -o page.pdf # Export PDF
agent-browser record start # Start video recording
agent-browser record stop # Stop and save video
agent-browser record restart # Restart recording
Wait Conditions
agent-browser wait @e1 # Wait for element
agent-browser wait --text "Success" # Wait for text to appear
agent-browser wait --url "/dashboard" # Wait for URL pattern
agent-browser wait --load # Wait for page load
agent-browser wait --idle # Wait for network idle
agent-browser wait --fn "() => window.ready" # Wait for JS condition
Browser Configuration
agent-browser viewport 1920 1080 # Set viewport size
agent-browser device "iPhone 14" # Emulate device
agent-browser geolocation 40.7 -74.0 # Set geolocation
agent-browser offline true # Enable offline mode
agent-browser headers '{"X-Custom":"val"}' # Set headers
agent-browser credentials user pass # HTTP auth
agent-browser color-scheme dark # Set color scheme
Storage Management
agent-browser cookies # List cookies
agent-browser cookies set name=val # Set cookie
agent-browser cookies clear # Clear cookies
agent-browser storage local # Get localStorage
agent-browser storage session # Get sessionStorage
agent-browser state save auth.json # Save browser state
agent-browser state load auth.json # Load browser state
Network Control
agent-browser network route "**/*.jpg" --abort # Block requests
agent-browser network route "**/api/*" --body '{"data":[]}' # Mock response
agent-browser network unroute "**/*.jpg" # Remove specific route
agent-browser network requests # List intercepted requests
Semantic Finding
agent-browser find role button # Find by ARIA role
agent-browser find text "Submit" # Find by text content
agent-browser find label "Email" # Find by label
agent-browser find placeholder "Search" # Find by placeholder
agent-browser find testid "login-btn" # Find by data-testid
agent-browser find first "button" # First matching element
agent-browser find last "li" # Last matching element
agent-browser find nth 2 "li" # Nth element (0-indexed)
Advanced
agent-browser tabs # List tabs
agent-browser tab new # New tab
agent-browser tab 2 # Switch to tab
agent-browser tab close # Close current tab
agent-browser frame 0 # Switch to frame
agent-browser dialog accept # Accept dialog
agent-browser dialog dismiss # Dismiss dialog
agent-browser eval "document.title" # Execute JS
agent-browser highlight @e1 # Highlight element visually
agent-browser mouse move 100 200 # Move mouse to coordinates
agent-browser mouse down # Mouse button down
agent-browser mouse up # Mouse button up
Global Options
| Option | Description |
|---|---|
--session <name> |
Named session for parallel testing |
--json |
JSON output for parsing |
--headed |
Show browser window |
--cdp <port> |
Connect via Chrome DevTools Protocol |
-p <provider> |
Cloud browser provider |
--proxy <url> |
Proxy server |
--headers <json> |
Custom HTTP headers |
--executable-path |
Custom browser binary |
--extension <path> |
Load browser extension |
Environment Variables
| Variable | Description |
|---|---|
AGENT_BROWSER_SESSION |
Default session name |
AGENT_BROWSER_PROVIDER |
Cloud provider (e.g., browserbase) |
AGENT_BROWSER_EXECUTABLE_PATH |
Browser binary location |
AGENT_BROWSER_EXTENSIONS |
Comma-separated extension paths |
AGENT_BROWSER_STREAM_PORT |
WebSocket streaming port |
AGENT_BROWSER_HOME |
Custom installation directory |
AGENT_BROWSER_PROFILE |
Browser profile directory |
BROWSERBASE_API_KEY |
Browserbase API key |
BROWSERBASE_PROJECT_ID |
Browserbase project ID |
Common Patterns
Form Submission
agent-browser open https://example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3 # Submit button
agent-browser wait url "/dashboard"
State Persistence (Auth)
# Save authenticated state
agent-browser open https://example.com/login
# ... login steps ...
agent-browser state save auth.json
# Reuse in future sessions
agent-browser state load auth.json
agent-browser open https://example.com/dashboard
Video Recording (Debugging)
agent-browser open https://example.com
agent-browser record start
# ... perform actions ...
agent-browser record stop # Saves to recording.webm
Parallel Sessions
# Terminal 1
agent-browser --session test1 open https://example.com
# Terminal 2
agent-browser --session test2 open https://example.com
Cloud Browsers (Browserbase)
For CI/CD or environments without local browser:
# Set credentials
export BROWSERBASE_API_KEY="your-api-key"
export BROWSERBASE_PROJECT_ID="your-project-id"
# Use cloud browser
agent-browser -p browserbase open https://example.com
See references/browserbase-cloud-setup.md for detailed setup.
Troubleshooting
| Issue | Solution |
|---|---|
| Command not found | Run npm install -g agent-browser |
| Chromium missing | Run agent-browser install |
| Linux deps missing | Run agent-browser install --with-deps |
| Session stale | Close browser: agent-browser close |
| Element not found | Re-run snapshot -i after page changes |
Resources
More from hotriluan/alkana-dashboard
frontend-design
Create polished frontend interfaces from designs/screenshots/videos. Use for web components, 3D experiences, replicating UI designs, quick prototypes, immersive interfaces, avoiding AI slop.
19ui-ux-pro-max
UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.
3frontend-dev-guidelines
Build React/TypeScript frontends with modern patterns. Use for components, Suspense, lazy loading, useSuspenseQuery, MUI v7 styling, TanStack Router, performance optimization.
3copywriting
Conversion copywriting formulas, headline templates, email copy patterns, landing page structures, CTA optimization, and writing style extraction. Activate for writing high-converting copy, crafting headlines, email campaigns, landing pages, or applying custom writing styles from assets/writing-styles/ directory.
3ui-styling
Style UIs with shadcn/ui components (Radix UI + Tailwind CSS). Use for accessible components, themes, dark mode, responsive layouts, design systems, color customization.
3media-processing
Process media with FFmpeg (video/audio), ImageMagick (images), RMBG (AI background removal). Use for encoding, format conversion, filters, thumbnails, batch processing, HLS/DASH streaming.
3