exploring-browser

Installation

SKILL.md

Browser Exploration Skill

Interactive browser automation with real-time exploration and flow recording. Backed by the robomotion-browser-mcp server.

Pairs with: reversing-network — once you have captured network traffic, switch from browser automation to HTTP if a clean API is exposed.

Overview

This skill enables pair-programming with the browser - you and Claude explore websites together, discover selectors, and build automation flows. At the end, a JSON flow sequence is generated that maps directly to Robomotion SDK TypeScript.

Workflow

Step 0 — Load tool schemas. In Claude Code, mcp__browser__* tools are deferred (only names in the catalog until you pull their schemas). Calling browser_open cold sends empty/malformed JSON over stdio, crashes robomotion-browser-mcp, and blacklists every browser tool for the rest of the session — ToolSearch will then return "no matching deferred tools" even though claude mcp list still reports ✓ Connected (that's a fresh probe, not the session's dead pipe). Before the first call, run:
ToolSearch query="select:mcp__browser__browser_open,mcp__browser__browser_navigate,mcp__browser__browser_snapshot,mcp__browser__browser_type,mcp__browser__browser_click,mcp__browser__browser_close,mcp__browser__browser_get_sequence,mcp__browser__browser_set_flow_name"
(Add browser_wait, browser_select, browser_query, browser_start_network_capture, etc. if you need them.) Once the server is dead, only a Claude Code restart brings it back — the connection can't be reattached from inside the session. Agents on other runtimes should substitute their own MCP invocation style.

Open Browser - Launch with browser_open (stealth mode enabled by default)
Navigate - Go to target URL with browser_navigate
Snapshot - Use browser_snapshot to see page structure with refs (primary exploration tool)
Act - Use refs from snapshot: browser_click @e3, browser_type @e5 (auto-recorded with XPath)
Re-snapshot - After page changes (click, navigation), take a new snapshot
Close Browser - ALWAYS call browser_close when exploration is done — this returns the flow sequence JSON and frees the browser
Ask User - Present the plan/sequence and use AskUserQuestion to ask whether to proceed with building the flow
Convert - Only after user approval, convert JSON to TypeScript for robomotion-sdk

CRITICAL: You MUST close the browser (browser_close) when you have finished exploring all the steps. Do NOT leave the browser open while building/modifying code. And you MUST ask the user with AskUserQuestion before proceeding to write or modify flow code.

Key Concepts

Snapshot-First Exploration (MANDATORY)

On EVERY new URL or page change, your FIRST action MUST be browser_snapshot. This is non-negotiable. Snapshot uses ~50-100 tokens vs ~500-2000 tokens for a screenshot.

GOOD: browser_navigate → browser_snapshot → read tree → browser_click @e3
BAD:  browser_navigate → browser_screenshot → browser_query → browser_query → ...
BAD:  browser_navigate → browser_query → browser_query → browser_query → ...

Rules:

FIRST CALL after any navigation (browser_navigate, browser_click that loads a new page, browser_reload, browser_go_back) MUST be browser_snapshot
Use the refs (@e1, @e2, ...) from the snapshot as selectors in action tools
After a page-changing action (click, form submit), take a NEW snapshot immediately
DO NOT use browser_screenshot for page discovery — it costs 10-40x more tokens. Only use screenshot as a LAST RESORT when you are stuck and cannot proceed with snapshot refs alone (e.g., visual layout issues, image verification)
DO NOT use browser_query for page discovery — snapshot already shows all interactive elements. Only use query for targeted checks on specific selectors
NEVER make multiple sequential browser_query calls to "discover" the page — that's what snapshot is for

Refs — Lightweight Element References

Snapshots assign refs like @e1, @e2 to interactive elements. These refs work as selectors in ALL action tools:

browser_click selector=@e3          → clicks the element, records its XPath
browser_type selector=@e5 text=hello → types into the element, records its XPath

Refs are ephemeral — they're invalidated on navigation, reload, or go-back. After page changes, take a new snapshot to get fresh refs.

Recordings always store the resolved XPath/CSS selector, never the ref. This means the RPA robot gets stable selectors.

Explore Mode

explore=true (default for queries): Don't record the action, just explore
explore=false (default for actions): Record the action to flow sequence

Tool Type	Default explore	Behavior
Snapshot (`browser_snapshot`)	—	Exploration only (never recorded)
Queries (`browser_query`, `browser_get_text`)	true	Exploration, not recorded
Actions (`browser_click`, `browser_type`)	false	Recorded to flow
Navigation (`browser_navigate`)	false	Recorded to flow
Screenshots	true	Not recorded by default

Selectors

Use refs (preferred), XPath, or CSS selectors:

# Refs (from browser_snapshot) — PREFERRED
@e1, @e2, @e3

# XPath examples
//button[@id='submit']
//input[@name='email']

# CSS examples
#submit
input[name='email']

Available Tools

Lifecycle

browser_open - Launch browser (headless, stealth, viewport options)
browser_close - Close browser and return flow sequence JSON

Navigation

browser_navigate - Go to URL
browser_reload - Refresh page
browser_go_back - Navigate back

Exploration

browser_snapshot - ALWAYS USE FIRST — Accessibility tree with refs (@e1, @e2, ...). Mandatory after every navigation.
browser_screenshot - LAST RESORT ONLY — Visual capture. Only use when stuck and snapshot refs aren't enough.
browser_query - Targeted element check (use sparingly — snapshot already shows everything)
browser_get_text - Get element text content
browser_get_value - Get input value
browser_get_attr - Get attribute value

Actions (Auto-recorded)

browser_click - Click element
browser_type - Type into input
browser_select - Select dropdown option
browser_wait - Wait for element
browser_scroll - Scroll to element
browser_hover - Hover over element

Context

browser_switch_iframe - Enter iframe
browser_switch_main - Exit to main page

Session

browser_get_sequence - Get current JSON sequence
browser_clear_sequence - Start fresh
browser_set_flow_name - Name the flow

Cookies

browser_get_cookies - Get cookies (optionally filter by domain)
browser_set_cookie - Set a single cookie
browser_set_cookies - Set multiple cookies
browser_clear_cookies - Clear cookies (optionally by domain)

Network Capture (API Discovery)

browser_start_network_capture - Start capturing requests/responses
browser_get_requests - Get captured requests (filter by method/URL)
browser_get_responses - Get captured responses (filter by status/URL)
browser_get_request_response - Get request-response pair by ID
browser_stop_network_capture - Stop capturing
browser_clear_network_capture - Clear captured data

Console Capture (Debugging)

browser_start_console_capture - Start capturing console logs
browser_get_console_logs - Get captured logs (filter by type)
browser_clear_console_logs - Clear captured logs
browser_stop_console_capture - Stop capturing

Session Persistence

browser_export_session - Export cookies, localStorage, URL as JSON
browser_import_session - Import previously exported session
browser_save_session - Save session to file
browser_load_session - Load session from file

Element Picker (Interactive Selection)

browser_start_element_picker - Start interactive element picker (red highlight on hover)
browser_get_picked_element - Get element selected by user via Ctrl+X
browser_stop_element_picker - Stop picker and remove highlight

Example Session

User: Automate login to example.com

Claude: I'll open a browser and explore the login page.
[browser_open stealth=true]
[browser_navigate url="https://example.com/login"]
[browser_snapshot]
// Returns:
//   - textbox "Email" [ref=e1]
//   - textbox "Password" [ref=e2] [type=password]
//   - button "Sign In" [ref=e3]

Now I'll perform the login actions using refs:
[browser_type selector=@e1 text="user@example.com"]
[browser_type selector=@e2 text="secret123"]
[browser_click selector=@e3]

// Page changed — take a new snapshot to see what happened
[browser_snapshot]

// Done exploring — CLOSE THE BROWSER to get the flow sequence
[browser_close]
// Returns flow sequence JSON with resolved XPath selectors (not refs)

Claude: Here's the plan I recorded:
1. Open Browser → Navigate to login page
2. Type email → Type password → Click Sign In

[AskUserQuestion: "Should I build the TypeScript flow from this sequence?"]

// Only after user says yes:
Here's the TypeScript flow:
// ... generated code ...

Tips

Snapshot first, always — one browser_snapshot shows every interactive element with refs.
Re-snapshot after page changes — navigation/clicks invalidate refs.
Close the browser when done — always browser_close. Never leave it open while writing code.

The conversion from recorded JSON to TypeScript belongs to creating-flow; this skill's job ends at the sequence JSON. For browser-specific SDK patterns (proxy config, selector tips), see ./docs/patterns/browser.md.

Pair-Programming Workflows

When Element Not Found - Use Element Picker!

When you can't find an element after multiple attempts, use the interactive element picker instead of asking the user to manually inspect:

1. Take a screenshot to see current state
2. Try 2-3 different selector strategies
3. Check if element is in an iframe (browser_switch_iframe)
4. If still not found, START THE ELEMENT PICKER:
   [browser_start_element_picker]
   "I can't find the element. I've enabled the element picker -
    please hover over the element you want and press Ctrl+X to select it.
    Press Escape to cancel."
5. Wait for user, then retrieve the element:
   [browser_get_picked_element]
6. Use the recommended selectors from the result

Example workflow:

Claude: I'm looking for the login button but can't find it with the usual selectors.
        Let me enable the element picker so you can show me exactly which element you mean.

[browser_start_element_picker]

Result: Element picker started. User should hover over elements (red highlight)
        and press Ctrl+X to select.

Claude: Please hover over the button you want me to click and press Ctrl+X.
        You'll see a red highlight around elements as you hover.

--- User hovers and presses Ctrl+X ---

[browser_get_picked_element]

Result: {
  "picked": true,
  "element": {
    "tagName": "button",
    "className": "btn-primary submit-form",
    "text": "Sign In",
    "recommendedSelectors": [
      { "type": "xpath", "selector": "//button[contains(text(),'Sign In')]", "confidence": 80, "reason": "Contains text" },
      { "type": "xpath", "selector": "//button[@class='btn-primary submit-form']", "confidence": 70, "reason": "Class match" }
    ]
  }
}

Claude: Got it! The button has text "Sign In". I'll use this selector:
[browser_click selector="//button[contains(text(),'Sign In')]"]

Network Analysis for API Discovery

Use network capture to understand what APIs a page uses:

1. browser_start_network_capture capture_body=true
2. Perform the action (login, form submit, etc.)
3. browser_get_requests method=POST
4. Analyze the API endpoints found
5. Suggest: "Instead of browser automation, you could use
   Core.Net.HttpRequest directly with this endpoint"

Example workflow:

Claude: Let me capture network traffic to see what API this uses...
[browser_start_network_capture capture_body=true]

Claude: Now I'll fill in the login form...
[browser_type selector="//input[@name='email']" text="test@example.com"]
[browser_type selector="//input[@name='password']" text="password"]
[browser_click selector="//button[@type='submit']"]

Claude: Let me check what API calls were made...
[browser_get_requests method=POST]

Result: Found POST to /api/auth/login with JSON body:
{
  "email": "test@example.com",
  "password": "password"
}

Claude: I found the login API endpoint. Instead of browser automation,
you could create a simpler flow using Core.Net.HttpRequest:
- POST to /api/auth/login
- Store the auth token from response
- Use token for subsequent API calls

Would you prefer the API-based approach?

Debugging with Console Logs

When actions don't work as expected:

1. browser_start_console_capture types=["error","warn"]
2. Retry the failing action
3. browser_get_console_logs type=error
4. Report findings and adjust approach

Example workflow:

Claude: The click didn't seem to work. Let me check for JavaScript errors...
[browser_start_console_capture types=["error","warn"]]
[browser_click selector="//button[@id='submit']"]
[browser_get_console_logs type=error]

Result: Error: "Cannot read property 'validate' of undefined" at form.js:42

Claude: Found a JavaScript error - the page has a bug in its validation code.
Let me try waiting for the form to fully load first...
[browser_wait selector="//form[@data-ready='true']"]

Session Persistence for Auth

When working with authenticated sites:

1. Complete login manually or via automation
2. browser_export_session - save for later
3. Close browser, do other work
4. browser_open, browser_import_session - restore auth

Example workflow:

Claude: Let me save your authenticated session for later use...
[browser_export_session]

Session exported with 5 cookies and current URL.

Claude: I'll save this to a file so you can reuse it:
[browser_save_session path="/tmp/myapp-session.json"]

--- Later ---

Claude: Let me restore your previous session...
[browser_open]
[browser_load_session path="/tmp/myapp-session.json"]

Session restored - you're logged in as user@example.com

When to Use Browser vs HTTP Request

Scenario	Recommendation
Simple API with known endpoints	Use `Core.Net.HttpRequest`
Login flow → need auth cookies	Use browser, then export session
Complex JavaScript-rendered content	Use browser automation
Form with CSRF tokens	Use browser (handles tokens automatically)
Scraping paginated data	Start with browser, switch to API if found
File downloads	Use browser for auth, API for download

Decision workflow:

Start with network capture
If clean REST API found → suggest HTTPRequest
If complex auth/CSRF → stick with browser
If both needed → browser for auth, API for data

Related skills

More from robomotionio/agent-skills

Installs

Repository

robomotionio/ag…t-skills

GitHub Stars

First Seen

Apr 16, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykFail

exploring-browser

Browser Exploration Skill

Overview

Workflow

Key Concepts

Snapshot-First Exploration (MANDATORY)

Refs — Lightweight Element References

Explore Mode

Selectors

Available Tools

Lifecycle

Navigation

Exploration

Actions (Auto-recorded)

Context

Session

Cookies

Network Capture (API Discovery)

Console Capture (Debugging)

Session Persistence

Element Picker (Interactive Selection)

Example Session

Tips

Pair-Programming Workflows

When Element Not Found - Use Element Picker!

Network Analysis for API Discovery

Debugging with Console Logs

Session Persistence for Auth

When to Use Browser vs HTTP Request

More from robomotionio/agent-skills

running-flow

creating-flow

validating-flow

testing-flow

reversing-network

searching-packages