agent-browser
Agent Browser Skill
Use this fast browser automation CLI designed for AI agents. Use deterministic element references (@e1, @e2) from accessibility tree snapshots for reliable element targeting.
Target: KnearMe portfolio platform (knearme-portfolio and knearme-cms)
Session Startup (REQUIRED)
Before any agent-browser commands, ensure the daemon is running. The daemon manages browser sessions and persists state between commands.
Quick Start
# Source the helper script (sets env vars and starts daemon)
source /Users/aaronbaker/knearme-platform/.claude/skills/agent-browser/scripts/ensure-daemon.sh
Manual Alternative
# Check if daemon is running
pgrep -f "agent-browser.*daemon" || (
export AGENT_BROWSER_EXECUTABLE_PATH="$HOME/Library/Caches/ms-playwright/chromium-1200/chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
cd /Users/aaronbaker/knearme-platform/agent-browser
nohup node dist/daemon.js > /tmp/agent-browser-daemon.log 2>&1 &
)
Stop Daemon (When Done)
/Users/aaronbaker/knearme-platform/.claude/skills/agent-browser/scripts/stop-daemon.sh
Troubleshooting Startup
Mach Port Conflicts (macOS): If Chrome is running, the headless browser may fail to launch.
Solutions:
- Close Chrome before automation, OR
- Use CDP mode to connect to running Chrome:
# User runs this in a terminal: /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 & # Then connect via CDP: agent-browser connect 9222 agent-browser snapshot -i # Works
See CDP Mode section below for full details.
THE ONE RULE
After ANY action that might change the page, run snapshot -i
# WRONG - refs are stale after navigation
agent-browser click @e6
agent-browser click @e2 # FAILS - e2 is from old page
# RIGHT - re-snapshot after page change
agent-browser click @e6
agent-browser wait 2000
agent-browser snapshot -i # Get fresh refs
agent-browser click @e2 # Works
Expect refs to reset when the page changes; always re-snapshot.
Mental Model
LOOK -> DECIDE -> ACT -> LOOK AGAIN
1. snapshot -i # See what's on the page
2. (read output) # Find the right ref
3. click @e3 # Take action
4. wait 2000 # Let page update
5. snapshot -i # See new state
Never act twice without looking in between.
Core Workflow
Open and Explore
agent-browser open <url>
agent-browser snapshot -i
Interact with Elements
agent-browser fill @e2 "text" # Clear + fill input
agent-browser click @e3 # Click element
agent-browser press Enter # Press key
Verify Results
agent-browser get url # Check current URL
agent-browser screenshot /tmp/view.png # Visual capture
Close When Done
agent-browser close
Quick Decision Tree
What mode do I need?
Running inside a sandbox (Claude Code)?
-> Use CDP Mode - connect to user's Chrome
-> See "CDP Mode" section below
Test with user's logged-in Chrome?
-> Use CDP Mode - connect to user's Chrome
-> See "CDP Mode" section below
Test multiple users in parallel?
-> agent-browser --session user-1 open <url>
-> See references/advanced-features.md
Debug with user watching?
-> agent-browser --headed open <url>
Standard automation?
-> agent-browser open <url>
Need to "see" the page?
-> agent-browser screenshot /tmp/x.png
-> Read /tmp/x.png
Headed Mode (Visible Browser)
Show a visible browser window for debugging or user observation.
Requirements
- Daemon must be running first - always run
source ensure-daemon.shbefore any agent-browser commands - Flag goes BEFORE the command -
--headedmust come beforeopen
Correct Usage
# Step 1: Ensure daemon is running
source /Users/aaronbaker/knearme-platform/.claude/skills/agent-browser/scripts/ensure-daemon.sh
# Step 2: Open with --headed flag BEFORE the command
agent-browser --headed open http://localhost:3000
Common Mistakes
# WRONG: --headed after URL (flag ignored, headless mode)
agent-browser open http://localhost:3000 --headed
# WRONG: No daemon running (will error: "Browser not launched")
agent-browser --headed open http://localhost:3000
# WRONG: "launch" command doesn't exist
agent-browser launch --headed
# RIGHT: Daemon first, then --headed before open
source ensure-daemon.sh
agent-browser --headed open http://localhost:3000
Recommended: Use dev-browse.sh
The dev-browse.sh script handles daemon setup automatically:
# Starts app, ensures daemon, opens browser with --headed
/Users/aaronbaker/knearme-platform/.claude/skills/agent-browser/scripts/dev-browse.sh portfolio --headed
# With viewport preset
./dev-browse.sh portfolio --headed --mobile
CDP Mode (Connecting to Existing Chrome)
When running inside a sandbox (like Claude Code) or needing user's auth cookies/sessions, connect to an existing Chrome instance instead of launching a new one.
Why? The sandbox lacks permissions to launch applications. CDP mode connects to Chrome running outside the sandbox.
Setup (User runs in their terminal, not Claude Code)
# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
Connection Methods
Method 1: Connect Once (Recommended)
Establish connection first, then subsequent commands work without flags:
agent-browser connect 9222 # Connect to CDP - do this first
agent-browser snapshot -i # No flag needed after connect
agent-browser click @e3 # Works automatically
agent-browser fill @e2 "text" # Works automatically
agent-browser close # Disconnect when done
Method 2: Flag on Every Command
Pass --cdp on each command (more verbose):
agent-browser --cdp 9222 snapshot -i
agent-browser --cdp 9222 click @e3
agent-browser --cdp 9222 fill @e2 "text"
When to Use CDP Mode
| Scenario | Use CDP? |
|---|---|
| Running in Claude Code sandbox | Yes |
| Need user's auth cookies | Yes |
| Debug alongside user | Yes |
| Test with user's extensions | Yes |
| Isolated/repeatable tests | No - use sessions |
| CI/CD automation | No - use headless |
Multiple Agents in Parallel
Run multiple agents simultaneously by connecting each to a different Chrome instance:
# Terminal 1: Start Chrome on port 9223
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9223 --user-data-dir=/tmp/chrome-9223 --headless=new
# Terminal 2: Start Chrome on port 9224
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9224 --user-data-dir=/tmp/chrome-9224 --headless=new
Then each agent connects to its own port:
# Agent 1 # Agent 2
agent-browser connect 9223 agent-browser connect 9224
agent-browser open <url> agent-browser open <url>
agent-browser snapshot -i agent-browser snapshot -i
See references/advanced-features.md for more CDP details.
Essential Commands
| Action | Command |
|---|---|
| Open URL | agent-browser open <url> |
| Get refs | agent-browser snapshot -i |
| Click | agent-browser click @e3 |
| Fill | agent-browser fill @e2 "text" |
| Type | agent-browser type @e2 "text" |
| Press key | agent-browser press Enter |
| Wait | agent-browser wait 2000 |
| Get URL | agent-browser get url |
| Screenshot | agent-browser screenshot /tmp/shot.png |
| Close | agent-browser close |
For full command reference, see references/command-reference.md.
Reality Checks (Verified 2026-01-15)
Use these observed behaviors from a local sweep to avoid false assumptions.
Selector vs Ref Rules
- Use refs (
@eN) only with action commands likeclick,fill,type,check,uncheck,hover,drag. - Use CSS selectors for
getandwait, not refs.- ✅
agent-browser get value "#email" - ❌
agent-browser get value @e3 - ✅
agent-browser wait "#log" - ❌
agent-browser wait @e9
- ✅
Known Limitations
- Avoid
select <selector> <value>; it fails with "Validation error: values: Invalid input". - Avoid
find testid,find first,find last,find nth; they fail with "Expected string, received null". - Avoid relying on
press Control+afor selecting text; usefilloreval. - Expect
tab new <url>to openabout:blankuntil a manualopen. - Pass
--filterwhen usingnetwork requests. - Provide a URL pattern for
network unroute(no default). - Avoid
set media dark; it errors until fixed. - Avoid
mouse wheel; usescrollinstead. - Prefer
storage local set/get;state loaddid not restore localStorage in sweep.
Fallback Recipes
Select dropdown without select:
agent-browser eval "const el=document.querySelector('#plan'); el.value='pro'; el.dispatchEvent(new Event('change', { bubbles: true }));"
Clear + set input without press Control+a:
agent-browser fill "#name" "Ada Lovelace"
Find by testid when find testid fails:
agent-browser click "[data-testid='submit-button']"
See references/capability-matrix.md for the full sweep results.
Common Patterns
Login Flow (KnearMe Portfolio)
agent-browser open http://localhost:3000/login
agent-browser snapshot -i
# Read output: textbox "Email" [ref=e2], textbox "Password" [ref=e4], button "Login" [ref=e6]
agent-browser fill @e2 "$TEST_USER_EMAIL"
agent-browser fill @e4 "$TEST_USER_PASSWORD"
agent-browser click @e6
agent-browser wait 2000
agent-browser get url # Verify redirect to /dashboard
Multi-Step Form
# Step 1
agent-browser snapshot -i
agent-browser fill @e1 "value"
agent-browser click @e4 # Next
# CRITICAL: Re-snapshot for step 2
agent-browser wait 1000
agent-browser snapshot -i # NEW REFS!
agent-browser fill @e1 "new value" # e1 is now different field
Visual Verification
agent-browser screenshot /tmp/view.png
# Then use Read tool on /tmp/view.png to "see" the page
UI Testing Approach
For effective UI testing, follow the four-phase pattern:
SETUP → ACTION → VALIDATE → RECOVERY
Visual Debugging Loop
When testing UIs, leverage Claude's vision capabilities:
# 1. CAPTURE - Take screenshot at decision points
agent-browser screenshot /tmp/state-before.png
# 2. ANALYZE - Use Read tool to visually inspect
# (Claude can identify layout issues, errors, missing elements)
# 3. ACT - Perform test action
agent-browser click @e5
agent-browser wait 2000
# 4. VERIFY - Capture and analyze result
agent-browser screenshot /tmp/state-after.png
# Compare before/after, check for expected changes
Key Testing Capabilities
| Capability | Command | Use Case |
|---|---|---|
| Visual state | screenshot + Read |
Layout issues, visual bugs |
| Element state | snapshot -i |
Available interactions |
| URL verification | get url |
Navigation testing |
| Console errors | console / errors |
JS debugging |
| Network mocking | network route |
Error state testing |
| Parallel sessions | --session |
Multi-tier testing |
See references/ui-testing-patterns.md for complete testing patterns.
See references/visual-debugging.md for visual debugging techniques.
Anti-Patterns
Acting without looking:
agent-browser open <url>
agent-browser click @e3 # BAD - did you check snapshot?
Using refs from examples:
# Docs say "fill @e2" but YOUR page might be different
agent-browser fill @e2 "text" # BAD - did you verify ref?
Multiple actions without re-snapshot:
agent-browser click @e3
agent-browser click @e5 # BAD - refs may be stale
Correct pattern:
agent-browser snapshot -i # LOOK
agent-browser click @e6 # ACT
agent-browser wait 2000 # WAIT
agent-browser snapshot -i # LOOK AGAIN
Bundled Resources
| Resource | Purpose |
|---|---|
references/project-context.md |
KnearMe test users, environments, key routes |
references/advanced-features.md |
CDP, sessions, device emulation, network mocking |
references/troubleshooting.md |
Error recovery patterns |
references/command-reference.md |
Full command documentation |
references/capability-matrix.md |
Verified command status and workarounds |
references/ui-testing-patterns.md |
Structured UI testing patterns (forms, auth, access control) |
references/visual-debugging.md |
Screenshot analysis and visual debugging techniques |
references/prompting-templates.md |
Self-prompting templates for effective testing |
Remember
- Always snapshot before acting - refs come from snapshot output
- Re-snapshot after page changes - refs reset on navigation
- Read the snapshot output - match label to ref
- Wait for page loads - give time for state to settle
- Screenshot + Read for visual - use Read tool to "see" the page