agent-browser
Installation
SKILL.md
Browser Automation with agent-browser
NixOS: All
agent-browsercommands require the Nix prefix:nix run github:numtide/llm-agents.nix#agent-browser -- <command> [args...]Always pass
--headedon the first command to open a visible browser window.
Quick start
agent-browser --headed open <url> # Navigate (visible browser)
agent-browser set viewport 1920 1080 # Set 1080p resolution (always do this after open)
agent-browser snapshot -i # Get interactive elements with refs
agent-browser click @e1 # Click element by ref
agent-browser fill @e2 "text" # Fill input by ref
agent-browser screenshot page.png # Take screenshot
agent-browser close # Close browser
Core workflow
Always open the browser in --headed mode so the user can pair browse.
- Navigate:
agent-browser --headed open <url> - Set viewport:
agent-browser set viewport 1920 1080(always do this right after opening) - Snapshot:
agent-browser snapshot -i(returns elements with refs like@e1,@e2) - Interact using refs from the snapshot
- Re-snapshot after navigation or significant DOM changes
Common Commands
Navigation & Snapshots
agent-browser --headed open <url> # Navigate (auto-prepends https://)
agent-browser back/forward/reload # Navigation controls
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -i -C # Also include cursor-interactive elements
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser close # Close browser
Interactions (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser dblclick @e1 # Double-click
agent-browser fill @e2 "text" # Clear and type
agent-browser type @e2 "text" # Type without clearing
agent-browser keyboard type "text" # Type at current focus
agent-browser press Enter # Press key
agent-browser hover @e1 # Hover element
agent-browser check @e1 # Check checkbox
agent-browser select @e1 "value" # Select dropdown option
agent-browser upload @e1 file.pdf # Upload file
agent-browser scroll down 500 # Scroll page
Get Information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser get value @e1 # Get input value
agent-browser get attr @e1 href # Get attribute
Wait & Screenshots
agent-browser wait @e1 # Wait for element
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Success" # Wait for text
agent-browser wait --load networkidle # Wait for network idle
agent-browser screenshot page.png # Take screenshot
agent-browser screenshot --annotate page.png # With numbered element labels
agent-browser screenshot --full # Full page screenshot
Example: Form submission
agent-browser --headed open https://example.com/form
agent-browser set viewport 1920 1080
agent-browser snapshot -i
# Output: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
Deep-dive documentation
| Reference | Description |
|---|---|
| references/command-reference.md | Complete command reference: all options, mouse control, semantic locators, network, tabs, frames, JS eval, cookies/storage, browser settings |
| references/snapshot-refs.md | Ref lifecycle, -C cursor flag, annotated screenshots, troubleshooting |
| references/session-management.md | Sessions, --session-name, state persistence, profiles |
| references/authentication.md | Login flows, OAuth, 2FA, state reuse, auth vault |
| references/diffing.md | Snapshot diffs, pixel-level screenshot diffs, URL comparison |
| references/security.md | Domain allowlist, action policies, confirmation, content boundaries |
| references/cloud-providers.md | Cloud browser providers (browserbase, browseruse, kernel) |
| references/streaming.md | Real-time browser streaming via WebSocket |
| references/ios-simulator.md | iOS Simulator automation with Safari (macOS only) |
Related skills