agent-browser
Browser Automation with agent-browser
Core Workflow
Every browser automation follows this pattern:
- Navigate:
agent-browser open <url> - Snapshot:
agent-browser snapshot -i(get element refs like@e1,@e2) - Interact: Use refs to click, fill, select
- Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
Essential Commands
# Navigation
agent-browser open <url> # Navigate
agent-browser close # Close browser
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -i -C # Include cursor-interactive elements
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click
agent-browser fill @e2 "text" # Clear and type
agent-browser select @e1 "option" # Select dropdown
agent-browser check @e1 # Checkbox
agent-browser press Enter # Key press
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
# Capture
agent-browser screenshot # Screenshot
agent-browser screenshot --full # Full page
Ref Lifecycle (Important)
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
Deep-Dive Documentation
| Reference | When to Use |
|---|---|
| references/commands.md | Full command reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Parallel sessions, state persistence, concurrent scraping |
| references/authentication.md | Login flows, OAuth, 2FA handling, state reuse |
| references/video-recording.md | Recording workflows for debugging and documentation |
| references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |
| references/common-patterns.md | Form submission, auth, data extraction, parallel sessions, iOS simulator |
| references/semantic-locators.md | Alternatives to refs |
| references/javascript-evaluation.md | eval rules, --stdin/-b explanation |
Ready-to-Use Templates
| Template | Description |
|---|---|
| templates/form-automation.sh | Form filling with validation |
| templates/authenticated-session.sh | Login once, reuse state |
| templates/capture-workflow.sh | Content extraction with screenshots |
More from hjewkes/agent-skills
self-improve
Use when a session produced reusable insights, when the user says "learn from this", "remember this", or "improve yourself", or after completing a complex task where patterns were discovered
64md-render
Use when asked to render, preview, or view a markdown file in the browser. Triggers on "render markdown", "preview this", "show me this document", "open in browser".
21code-review
Unified code review system — dispatches the right review agents for the situation. Use when reviewing code for quality, bugs, compliance, or before merging.
17skills-management
Use when creating, finding, installing, reviewing, or managing Claude Code skills — covers skill authoring, discovery, conventions, and lifecycle management
14github-pr
GitHub PR workflow — creating PRs, posting automated review comments, managing PR feedback cycles. Use when code is reviewed and ready for GitHub.
14buildkite
Buildkite CI/CD integration. Use when the user needs to check build status, trigger builds, read build logs, debug failures, manage pipelines, or any Buildkite workflow. Triggers include "buildkite", "build", "pipeline", "CI", "deploy", "build log", "build failed".
14