agent-browser
When to use this skill
Use this skill whenever the user wants to:
- Automate browser interactions (click, fill, navigate, screenshot) via CLI
- Scrape web content or extract data from pages
- Build AI agent workflows that interact with websites
- Use refs-based element selection for deterministic automation
- Run browser automation in agent mode with JSON output
- Manage authenticated sessions with custom headers or CDP
How to use this skill
This skill is organized to match the agent-browser official documentation structure (https://github.com/vercel-labs/agent-browser/blob/main/README.md). When working with agent-browser:
Quick-Start Example: Snapshot → Identify → Interact
# 1. Install
npm install -g @anthropic-ai/agent-browser
# 2. Open a page and take a snapshot to get element refs
agent-browser open "https://example.com"
agent-browser snapshot
# Output includes refs like @e1, @e2, @e3 for each element
# 3. Click an element by ref
agent-browser click @e3
# 4. Fill a form field
agent-browser fill @e5 "hello@example.com"
# 5. Agent mode (JSON output for programmatic use)
agent-browser snapshot --json
Detailed Documentation
-
Install agent-browser:
- Load
examples/getting-started/installation.mdfor installation instructions
- Load
-
Quick Start:
- Load
examples/quick-start/quick-start.mdfor basic workflow examples
- Load
-
Learn core commands:
- Load
examples/commands/basic-commands.mdfor basic commands (open, click, fill, etc.) - Load
examples/commands/advanced-commands.mdfor advanced commands (snapshot, eval, etc.) - Load
examples/commands/get-info/for information retrieval commands - Load
examples/commands/check-state/for state checking commands - Load
examples/commands/find-elements/for semantic locator commands - Load
examples/commands/wait/for wait commands - Load
examples/commands/mouse-control/for mouse control commands - Load
examples/commands/browser-settings/for browser configuration - Load
examples/commands/cookies-storage/for cookies and storage management - Load
examples/commands/network/for network interception - Load
examples/commands/tabs-windows/for tab and window management - Load
examples/commands/frames/for iframe handling - Load
examples/commands/dialogs/for dialog handling - Load
examples/commands/debug/for debugging commands - Load
examples/commands/navigation/for navigation commands - Load
examples/commands/setup/for setup commands
- Load
-
Understand selectors:
- Load
examples/selectors/refs.mdfor refs-based selection (@e1, @e2, etc.) - Load
examples/selectors/traditional-selectors.mdfor CSS, XPath, and semantic locators
- Load
-
Use agent mode:
- Load
examples/agent-mode/introduction.mdfor agent mode overview - Load
examples/agent-mode/optimal-workflow.mdfor optimal AI workflow - Load
examples/agent-mode/integration.mdfor integrating with AI agents
- Load
-
Advanced features:
- Load
examples/advanced/sessions.mdfor session management - Load
examples/advanced/headed-mode.mdfor debugging with visible browser - Load
examples/advanced/authenticated-sessions.mdfor authentication via headers - Load
examples/advanced/custom-executable.mdfor custom browser executable - Load
examples/advanced/cdp-mode.mdfor Chrome DevTools Protocol integration - Load
examples/advanced/streaming.mdfor browser viewport streaming - Load
examples/advanced/architecture.mdfor architecture overview - Load
examples/advanced/platforms.mdfor platform support - Load
examples/advanced/usage-with-agents.mdfor AI agent integration patterns
- Load
-
Configure options:
- Load
examples/options/global-options.mdfor global CLI options - Load
examples/options/snapshot-options.mdfor snapshot-specific options - Load
examples/options/session-options.mdfor session management options
- Load
-
Reference API documentation when needed:
api/commands.md- Complete command referenceapi/selectors.md- Selector referenceapi/options.md- Options reference
-
Use templates for quick start:
templates/basic-automation.md- Basic automation workflowtemplates/ai-agent-workflow.md- AI agent workflow template
Doc mapping (one-to-one with official documentation)
- See examples and API files → https://github.com/vercel-labs/agent-browser
Examples and Templates
This skill includes detailed examples organized to match the official documentation structure. All examples are in the examples/ directory (see mapping above).
To use examples:
- Identify the topic from the user's request
- Load the appropriate example file from the mapping above
- Follow the instructions, syntax, and best practices in that file
- Adapt the code examples to your specific use case
To use templates:
- Reference templates in
templates/directory for common scaffolding - Adapt templates to your specific needs and coding style
API Reference
- Commands API:
api/commands.md- Complete command reference with syntax and examples - Selectors API:
api/selectors.md- Selector types and usage reference - Options API:
api/options.md- All options reference
Best Practices
- Use Refs: Prefer refs (@e1, @e2) over traditional selectors for deterministic automation
- Snapshot First: Always snapshot before interacting with elements to get refs
- Agent Mode: Use
--jsonflag for machine-readable output in agent mode - Session Management: Use
--sessionto maintain state across commands - Interactive Snapshot: Use
-iflag for interactive snapshot selection - Semantic Locators: Use semantic locators (role/name) when refs are not available
- Error Handling: Check command exit codes and error messages
- Wait for Navigation: Commands automatically wait for navigation to complete
- Headed Mode: Use
--headedfor debugging, headless for production - CDP Integration: Use
--cdpfor Chrome DevTools Protocol integration - Streaming: Use
AGENT_BROWSER_STREAM_PORTfor live browser preview - Authenticated Sessions: Use
--headersfor authentication without login flows - Custom Executable: Use
--executable-pathfor serverless deployments or custom browsers - Snapshot Options: Combine
-i,-c,-d,-soptions to optimize snapshot output
Resources
- GitHub Repository: https://github.com/vercel-labs/agent-browser
- Official README: https://github.com/vercel-labs/agent-browser/blob/main/README.md
- Agent Mode Documentation: https://agent-browser.dev/agent-mode
- Issues: https://github.com/vercel-labs/agent-browser/issues
Keywords
agent-browser, CLI browser automation, AI agents, browser automation CLI, refs, snapshot, agent mode, semantic locators, browser automation tool, command-line browser, AI agent browser, deterministic selectors, accessibility tree, browser commands, web automation CLI, sessions, headed mode, authenticated sessions, CDP mode, streaming, Chrome DevTools Protocol, Playwright, browser automation for AI
More from partme-ai/full-stack-skills
vite
Guidance for Vite using the official Guide, Config Reference, and Plugins pages. Use when the user needs Vite setup, configuration, or plugin selection details.
68element-plus-vue3
Provides comprehensive guidance for Element Plus Vue 3 component library including installation, components, themes, internationalization, and API reference. Use when the user asks about Element Plus for Vue 3, needs to build Vue 3 applications with Element Plus, or customize component styles.
63vue3
Guidance for Vue 3 using the official guide and API reference. Use when the user needs Vue 3 concepts, patterns, or API details to build components, apps, and tooling.
54electron
Build cross-platform desktop applications with Electron, covering main/renderer process architecture, IPC communication, BrowserWindow management, menus, tray icons, packaging, and security best practices. Use when the user asks about Electron, needs to create desktop applications, implement Electron features, or build cross-platform desktop apps.
51uniapp-project
Provides per-component and per-API examples with cross-platform compatibility details for uni-app, covering built-in components, uni-ui components, and APIs (network, storage, device, UI, navigation, media). Use when the user needs official uni-app components or APIs, wants per-component examples with doc links, or needs platform compatibility checks.
40ascii-cli-logo-banner
Entry point for ASCII CLI banners that routes to the Python built-in font skill or figlet.js/FIGfont skill. Use when the user wants a startup banner, ASCII logo, terminal welcome screen, or CLI branding for a service.
38