Agent Browser Skill

Use this fast browser automation CLI designed for AI agents. Use deterministic element references (@e1, @e2) from accessibility tree snapshots for reliable element targeting.

Target: KnearMe portfolio platform (knearme-portfolio and knearme-cms)

Session Startup (REQUIRED)

Before any agent-browser commands, ensure the daemon is running. The daemon manages browser sessions and persists state between commands.

Quick Start

# Source the helper script (sets env vars and starts daemon)
source /Users/aaronbaker/knearme-platform/.claude/skills/agent-browser/scripts/ensure-daemon.sh

Manual Alternative

# Check if daemon is running
pgrep -f "agent-browser.*daemon" || (
  export AGENT_BROWSER_EXECUTABLE_PATH="$HOME/Library/Caches/ms-playwright/chromium-1200/chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
  cd /Users/aaronbaker/knearme-platform/agent-browser
  nohup node dist/daemon.js > /tmp/agent-browser-daemon.log 2>&1 &
)

Stop Daemon (When Done)

/Users/aaronbaker/knearme-platform/.claude/skills/agent-browser/scripts/stop-daemon.sh

Troubleshooting Startup

Mach Port Conflicts (macOS): If Chrome is running, the headless browser may fail to launch.

Solutions:

Close Chrome before automation, OR

Use CDP mode to connect to running Chrome:

# User runs this in a terminal:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 &

# Then connect via CDP:
agent-browser connect 9222
agent-browser snapshot -i  # Works

See CDP Mode section below for full details.

THE ONE RULE

After ANY action that might change the page, run snapshot -i

# WRONG - refs are stale after navigation
agent-browser click @e6
agent-browser click @e2      # FAILS - e2 is from old page

# RIGHT - re-snapshot after page change
agent-browser click @e6
agent-browser wait 2000
agent-browser snapshot -i    # Get fresh refs
agent-browser click @e2      # Works

Expect refs to reset when the page changes; always re-snapshot.

Mental Model

LOOK  ->  DECIDE  ->  ACT  ->  LOOK AGAIN

1. snapshot -i           # See what's on the page
2. (read output)         # Find the right ref
3. click @e3             # Take action
4. wait 2000             # Let page update
5. snapshot -i           # See new state

Never act twice without looking in between.

Core Workflow

Open and Explore

agent-browser open <url>
agent-browser snapshot -i

Interact with Elements

agent-browser fill @e2 "text"    # Clear + fill input
agent-browser click @e3          # Click element
agent-browser press Enter        # Press key

Verify Results

agent-browser get url                    # Check current URL
agent-browser screenshot /tmp/view.png   # Visual capture

Close When Done

agent-browser close

Quick Decision Tree

What mode do I need?

Running inside a sandbox (Claude Code)?
  -> Use CDP Mode - connect to user's Chrome
  -> See "CDP Mode" section below

Test with user's logged-in Chrome?
  -> Use CDP Mode - connect to user's Chrome
  -> See "CDP Mode" section below

Test multiple users in parallel?
  -> agent-browser --session user-1 open <url>
  -> See references/advanced-features.md

Debug with user watching?
  -> agent-browser --headed open <url>

Standard automation?
  -> agent-browser open <url>

Need to "see" the page?
  -> agent-browser screenshot /tmp/x.png
  -> Read /tmp/x.png

Headed Mode (Visible Browser)

Show a visible browser window for debugging or user observation.

Requirements

Daemon must be running first - always run source ensure-daemon.sh before any agent-browser commands
Flag goes BEFORE the command - --headed must come before open

Correct Usage

# Step 1: Ensure daemon is running
source /Users/aaronbaker/knearme-platform/.claude/skills/agent-browser/scripts/ensure-daemon.sh

# Step 2: Open with --headed flag BEFORE the command
agent-browser --headed open http://localhost:3000

Common Mistakes

# WRONG: --headed after URL (flag ignored, headless mode)
agent-browser open http://localhost:3000 --headed

# WRONG: No daemon running (will error: "Browser not launched")
agent-browser --headed open http://localhost:3000

# WRONG: "launch" command doesn't exist
agent-browser launch --headed

# RIGHT: Daemon first, then --headed before open
source ensure-daemon.sh
agent-browser --headed open http://localhost:3000

Recommended: Use dev-browse.sh

The dev-browse.sh script handles daemon setup automatically:

# Starts app, ensures daemon, opens browser with --headed
/Users/aaronbaker/knearme-platform/.claude/skills/agent-browser/scripts/dev-browse.sh portfolio --headed

# With viewport preset
./dev-browse.sh portfolio --headed --mobile

CDP Mode (Connecting to Existing Chrome)

When running inside a sandbox (like Claude Code) or needing user's auth cookies/sessions, connect to an existing Chrome instance instead of launching a new one.

Why? The sandbox lacks permissions to launch applications. CDP mode connects to Chrome running outside the sandbox.

Setup (User runs in their terminal, not Claude Code)

# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug

Connection Methods

Method 1: Connect Once (Recommended)

Establish connection first, then subsequent commands work without flags:

agent-browser connect 9222    # Connect to CDP - do this first
agent-browser snapshot -i     # No flag needed after connect
agent-browser click @e3       # Works automatically
agent-browser fill @e2 "text" # Works automatically
agent-browser close           # Disconnect when done

Method 2: Flag on Every Command

Pass --cdp on each command (more verbose):

agent-browser --cdp 9222 snapshot -i
agent-browser --cdp 9222 click @e3
agent-browser --cdp 9222 fill @e2 "text"

When to Use CDP Mode

Scenario	Use CDP?
Running in Claude Code sandbox	Yes
Need user's auth cookies	Yes
Debug alongside user	Yes
Test with user's extensions	Yes
Isolated/repeatable tests	No - use sessions
CI/CD automation	No - use headless

Multiple Agents in Parallel

Run multiple agents simultaneously by connecting each to a different Chrome instance:

# Terminal 1: Start Chrome on port 9223
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9223 --user-data-dir=/tmp/chrome-9223 --headless=new

# Terminal 2: Start Chrome on port 9224
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9224 --user-data-dir=/tmp/chrome-9224 --headless=new

Then each agent connects to its own port:

# Agent 1                          # Agent 2
agent-browser connect 9223         agent-browser connect 9224
agent-browser open <url>           agent-browser open <url>
agent-browser snapshot -i          agent-browser snapshot -i

See references/advanced-features.md for more CDP details.

Essential Commands

Action	Command
Open URL	`agent-browser open <url>`
Get refs	`agent-browser snapshot -i`
Click	`agent-browser click @e3`
Fill	`agent-browser fill @e2 "text"`
Type	`agent-browser type @e2 "text"`
Press key	`agent-browser press Enter`
Wait	`agent-browser wait 2000`
Get URL	`agent-browser get url`
Screenshot	`agent-browser screenshot /tmp/shot.png`
Close	`agent-browser close`

For full command reference, see references/command-reference.md.

Reality Checks (Verified 2026-01-15)

Use these observed behaviors from a local sweep to avoid false assumptions.

Selector vs Ref Rules

Use refs (@eN) only with action commands like click, fill, type, check, uncheck, hover, drag.
Use CSS selectors for get and wait, not refs.
- ✅ agent-browser get value "#email"
- ❌ agent-browser get value @e3
- ✅ agent-browser wait "#log"
- ❌ agent-browser wait @e9

Known Limitations

Avoid select <selector> <value>; it fails with "Validation error: values: Invalid input".
Avoid find testid, find first, find last, find nth; they fail with "Expected string, received null".
Avoid relying on press Control+a for selecting text; use fill or eval.
Expect tab new <url> to open about:blank until a manual open.
Pass --filter when using network requests.
Provide a URL pattern for network unroute (no default).
Avoid set media dark; it errors until fixed.
Avoid mouse wheel; use scroll instead.
Prefer storage local set/get; state load did not restore localStorage in sweep.

Fallback Recipes

Select dropdown without select:

agent-browser eval "const el=document.querySelector('#plan'); el.value='pro'; el.dispatchEvent(new Event('change', { bubbles: true }));"

Clear + set input without press Control+a:

agent-browser fill "#name" "Ada Lovelace"

Find by testid when find testid fails:

agent-browser click "[data-testid='submit-button']"

See references/capability-matrix.md for the full sweep results.

Common Patterns

Login Flow (KnearMe Portfolio)

agent-browser open http://localhost:3000/login
agent-browser snapshot -i
# Read output: textbox "Email" [ref=e2], textbox "Password" [ref=e4], button "Login" [ref=e6]
agent-browser fill @e2 "$TEST_USER_EMAIL"
agent-browser fill @e4 "$TEST_USER_PASSWORD"
agent-browser click @e6
agent-browser wait 2000
agent-browser get url  # Verify redirect to /dashboard

Multi-Step Form

# Step 1
agent-browser snapshot -i
agent-browser fill @e1 "value"
agent-browser click @e4  # Next

# CRITICAL: Re-snapshot for step 2
agent-browser wait 1000
agent-browser snapshot -i  # NEW REFS!
agent-browser fill @e1 "new value"  # e1 is now different field

Visual Verification

agent-browser screenshot /tmp/view.png
# Then use Read tool on /tmp/view.png to "see" the page

UI Testing Approach

For effective UI testing, follow the four-phase pattern:

SETUP → ACTION → VALIDATE → RECOVERY

Visual Debugging Loop

When testing UIs, leverage Claude's vision capabilities:

# 1. CAPTURE - Take screenshot at decision points
agent-browser screenshot /tmp/state-before.png

# 2. ANALYZE - Use Read tool to visually inspect
# (Claude can identify layout issues, errors, missing elements)

# 3. ACT - Perform test action
agent-browser click @e5
agent-browser wait 2000

# 4. VERIFY - Capture and analyze result
agent-browser screenshot /tmp/state-after.png
# Compare before/after, check for expected changes

Key Testing Capabilities

Capability	Command	Use Case
Visual state	`screenshot` + Read	Layout issues, visual bugs
Element state	`snapshot -i`	Available interactions
URL verification	`get url`	Navigation testing
Console errors	`console` / `errors`	JS debugging
Network mocking	`network route`	Error state testing
Parallel sessions	`--session`	Multi-tier testing

See references/ui-testing-patterns.md for complete testing patterns. See references/visual-debugging.md for visual debugging techniques.

Anti-Patterns

Acting without looking:

agent-browser open <url>
agent-browser click @e3  # BAD - did you check snapshot?

Using refs from examples:

# Docs say "fill @e2" but YOUR page might be different
agent-browser fill @e2 "text"  # BAD - did you verify ref?

Multiple actions without re-snapshot:

agent-browser click @e3
agent-browser click @e5  # BAD - refs may be stale

Correct pattern:

agent-browser snapshot -i      # LOOK
agent-browser click @e6        # ACT
agent-browser wait 2000        # WAIT
agent-browser snapshot -i      # LOOK AGAIN

Bundled Resources

Resource	Purpose
`references/project-context.md`	KnearMe test users, environments, key routes
`references/advanced-features.md`	CDP, sessions, device emulation, network mocking
`references/troubleshooting.md`	Error recovery patterns
`references/command-reference.md`	Full command documentation
`references/capability-matrix.md`	Verified command status and workarounds
`references/ui-testing-patterns.md`	Structured UI testing patterns (forms, auth, access control)
`references/visual-debugging.md`	Screenshot analysis and visual debugging techniques
`references/prompting-templates.md`	Self-prompting templates for effective testing

Remember

Always snapshot before acting - refs come from snapshot output
Re-snapshot after page changes - refs reset on navigation
Read the snapshot output - match label to ref
Wait for page loads - give time for state to settle
Screenshot + Read for visual - use Read tool to "see" the page