exploring-browser
Browser Exploration Skill
Interactive browser automation with real-time exploration and flow recording. Backed by the robomotion-browser-mcp server.
Pairs with: reversing-network — once you have captured network traffic, switch from browser automation to HTTP if a clean API is exposed.
Overview
This skill enables pair-programming with the browser - you and Claude explore websites together, discover selectors, and build automation flows. At the end, a JSON flow sequence is generated that maps directly to Robomotion SDK TypeScript.
Workflow
Step 0 — Load tool schemas. In Claude Code,
mcp__browser__*tools are deferred (only names in the catalog until you pull their schemas). Callingbrowser_opencold sends empty/malformed JSON over stdio, crashesrobomotion-browser-mcp, and blacklists every browser tool for the rest of the session —ToolSearchwill then return "no matching deferred tools" even thoughclaude mcp liststill reports ✓ Connected (that's a fresh probe, not the session's dead pipe). Before the first call, run:ToolSearch query="select:mcp__browser__browser_open,mcp__browser__browser_navigate,mcp__browser__browser_snapshot,mcp__browser__browser_type,mcp__browser__browser_click,mcp__browser__browser_close,mcp__browser__browser_get_sequence,mcp__browser__browser_set_flow_name"(Add
browser_wait,browser_select,browser_query,browser_start_network_capture, etc. if you need them.) Once the server is dead, only a Claude Code restart brings it back — the connection can't be reattached from inside the session. Agents on other runtimes should substitute their own MCP invocation style.
- Open Browser - Launch with
browser_open(stealth mode enabled by default) - Navigate - Go to target URL with
browser_navigate - Snapshot - Use
browser_snapshotto see page structure with refs (primary exploration tool) - Act - Use refs from snapshot:
browser_click @e3,browser_type @e5(auto-recorded with XPath) - Re-snapshot - After page changes (click, navigation), take a new snapshot
- Close Browser - ALWAYS call
browser_closewhen exploration is done — this returns the flow sequence JSON and frees the browser - Ask User - Present the plan/sequence and use
AskUserQuestionto ask whether to proceed with building the flow - Convert - Only after user approval, convert JSON to TypeScript for robomotion-sdk
CRITICAL: You MUST close the browser (browser_close) when you have finished exploring all the steps. Do NOT leave the browser open while building/modifying code. And you MUST ask the user with AskUserQuestion before proceeding to write or modify flow code.
Key Concepts
Snapshot-First Exploration (MANDATORY)
On EVERY new URL or page change, your FIRST action MUST be browser_snapshot. This is non-negotiable. Snapshot uses ~50-100 tokens vs ~500-2000 tokens for a screenshot.
GOOD: browser_navigate → browser_snapshot → read tree → browser_click @e3
BAD: browser_navigate → browser_screenshot → browser_query → browser_query → ...
BAD: browser_navigate → browser_query → browser_query → browser_query → ...
Rules:
- FIRST CALL after any navigation (
browser_navigate,browser_clickthat loads a new page,browser_reload,browser_go_back) MUST bebrowser_snapshot - Use the refs (
@e1,@e2, ...) from the snapshot as selectors in action tools - After a page-changing action (click, form submit), take a NEW snapshot immediately
- DO NOT use
browser_screenshotfor page discovery — it costs 10-40x more tokens. Only use screenshot as a LAST RESORT when you are stuck and cannot proceed with snapshot refs alone (e.g., visual layout issues, image verification) - DO NOT use
browser_queryfor page discovery — snapshot already shows all interactive elements. Only use query for targeted checks on specific selectors - NEVER make multiple sequential
browser_querycalls to "discover" the page — that's what snapshot is for
Refs — Lightweight Element References
Snapshots assign refs like @e1, @e2 to interactive elements. These refs work as selectors in ALL action tools:
browser_click selector=@e3 → clicks the element, records its XPath
browser_type selector=@e5 text=hello → types into the element, records its XPath
Refs are ephemeral — they're invalidated on navigation, reload, or go-back. After page changes, take a new snapshot to get fresh refs.
Recordings always store the resolved XPath/CSS selector, never the ref. This means the RPA robot gets stable selectors.
Explore Mode
- explore=true (default for queries): Don't record the action, just explore
- explore=false (default for actions): Record the action to flow sequence
| Tool Type | Default explore | Behavior |
|---|---|---|
Snapshot (browser_snapshot) |
— | Exploration only (never recorded) |
Queries (browser_query, browser_get_text) |
true | Exploration, not recorded |
Actions (browser_click, browser_type) |
false | Recorded to flow |
Navigation (browser_navigate) |
false | Recorded to flow |
| Screenshots | true | Not recorded by default |
Selectors
Use refs (preferred), XPath, or CSS selectors:
# Refs (from browser_snapshot) — PREFERRED
@e1, @e2, @e3
# XPath examples
//button[@id='submit']
//input[@name='email']
# CSS examples
#submit
input[name='email']
Available Tools
Lifecycle
browser_open- Launch browser (headless, stealth, viewport options)browser_close- Close browser and return flow sequence JSON
Navigation
browser_navigate- Go to URLbrowser_reload- Refresh pagebrowser_go_back- Navigate back
Exploration
browser_snapshot- ALWAYS USE FIRST — Accessibility tree with refs (@e1, @e2, ...). Mandatory after every navigation.browser_screenshot- LAST RESORT ONLY — Visual capture. Only use when stuck and snapshot refs aren't enough.browser_query- Targeted element check (use sparingly — snapshot already shows everything)browser_get_text- Get element text contentbrowser_get_value- Get input valuebrowser_get_attr- Get attribute value
Actions (Auto-recorded)
browser_click- Click elementbrowser_type- Type into inputbrowser_select- Select dropdown optionbrowser_wait- Wait for elementbrowser_scroll- Scroll to elementbrowser_hover- Hover over element
Context
browser_switch_iframe- Enter iframebrowser_switch_main- Exit to main page
Session
browser_get_sequence- Get current JSON sequencebrowser_clear_sequence- Start freshbrowser_set_flow_name- Name the flow
Cookies
browser_get_cookies- Get cookies (optionally filter by domain)browser_set_cookie- Set a single cookiebrowser_set_cookies- Set multiple cookiesbrowser_clear_cookies- Clear cookies (optionally by domain)
Network Capture (API Discovery)
browser_start_network_capture- Start capturing requests/responsesbrowser_get_requests- Get captured requests (filter by method/URL)browser_get_responses- Get captured responses (filter by status/URL)browser_get_request_response- Get request-response pair by IDbrowser_stop_network_capture- Stop capturingbrowser_clear_network_capture- Clear captured data
Console Capture (Debugging)
browser_start_console_capture- Start capturing console logsbrowser_get_console_logs- Get captured logs (filter by type)browser_clear_console_logs- Clear captured logsbrowser_stop_console_capture- Stop capturing
Session Persistence
browser_export_session- Export cookies, localStorage, URL as JSONbrowser_import_session- Import previously exported sessionbrowser_save_session- Save session to filebrowser_load_session- Load session from file
Element Picker (Interactive Selection)
browser_start_element_picker- Start interactive element picker (red highlight on hover)browser_get_picked_element- Get element selected by user via Ctrl+Xbrowser_stop_element_picker- Stop picker and remove highlight
Example Session
User: Automate login to example.com
Claude: I'll open a browser and explore the login page.
[browser_open stealth=true]
[browser_navigate url="https://example.com/login"]
[browser_snapshot]
// Returns:
// - textbox "Email" [ref=e1]
// - textbox "Password" [ref=e2] [type=password]
// - button "Sign In" [ref=e3]
Now I'll perform the login actions using refs:
[browser_type selector=@e1 text="user@example.com"]
[browser_type selector=@e2 text="secret123"]
[browser_click selector=@e3]
// Page changed — take a new snapshot to see what happened
[browser_snapshot]
// Done exploring — CLOSE THE BROWSER to get the flow sequence
[browser_close]
// Returns flow sequence JSON with resolved XPath selectors (not refs)
Claude: Here's the plan I recorded:
1. Open Browser → Navigate to login page
2. Type email → Type password → Click Sign In
[AskUserQuestion: "Should I build the TypeScript flow from this sequence?"]
// Only after user says yes:
Here's the TypeScript flow:
// ... generated code ...
Tips
- Snapshot first, always — one
browser_snapshotshows every interactive element with refs. - Re-snapshot after page changes — navigation/clicks invalidate refs.
- Close the browser when done — always
browser_close. Never leave it open while writing code.
The conversion from recorded JSON to TypeScript belongs to creating-flow; this skill's job ends at the sequence JSON. For browser-specific SDK patterns (proxy config, selector tips), see ./docs/patterns/browser.md.
Pair-Programming Workflows
When Element Not Found - Use Element Picker!
When you can't find an element after multiple attempts, use the interactive element picker instead of asking the user to manually inspect:
1. Take a screenshot to see current state
2. Try 2-3 different selector strategies
3. Check if element is in an iframe (browser_switch_iframe)
4. If still not found, START THE ELEMENT PICKER:
[browser_start_element_picker]
"I can't find the element. I've enabled the element picker -
please hover over the element you want and press Ctrl+X to select it.
Press Escape to cancel."
5. Wait for user, then retrieve the element:
[browser_get_picked_element]
6. Use the recommended selectors from the result
Example workflow:
Claude: I'm looking for the login button but can't find it with the usual selectors.
Let me enable the element picker so you can show me exactly which element you mean.
[browser_start_element_picker]
Result: Element picker started. User should hover over elements (red highlight)
and press Ctrl+X to select.
Claude: Please hover over the button you want me to click and press Ctrl+X.
You'll see a red highlight around elements as you hover.
--- User hovers and presses Ctrl+X ---
[browser_get_picked_element]
Result: {
"picked": true,
"element": {
"tagName": "button",
"className": "btn-primary submit-form",
"text": "Sign In",
"recommendedSelectors": [
{ "type": "xpath", "selector": "//button[contains(text(),'Sign In')]", "confidence": 80, "reason": "Contains text" },
{ "type": "xpath", "selector": "//button[@class='btn-primary submit-form']", "confidence": 70, "reason": "Class match" }
]
}
}
Claude: Got it! The button has text "Sign In". I'll use this selector:
[browser_click selector="//button[contains(text(),'Sign In')]"]
Network Analysis for API Discovery
Use network capture to understand what APIs a page uses:
1. browser_start_network_capture capture_body=true
2. Perform the action (login, form submit, etc.)
3. browser_get_requests method=POST
4. Analyze the API endpoints found
5. Suggest: "Instead of browser automation, you could use
Core.Net.HttpRequest directly with this endpoint"
Example workflow:
Claude: Let me capture network traffic to see what API this uses...
[browser_start_network_capture capture_body=true]
Claude: Now I'll fill in the login form...
[browser_type selector="//input[@name='email']" text="test@example.com"]
[browser_type selector="//input[@name='password']" text="password"]
[browser_click selector="//button[@type='submit']"]
Claude: Let me check what API calls were made...
[browser_get_requests method=POST]
Result: Found POST to /api/auth/login with JSON body:
{
"email": "test@example.com",
"password": "password"
}
Claude: I found the login API endpoint. Instead of browser automation,
you could create a simpler flow using Core.Net.HttpRequest:
- POST to /api/auth/login
- Store the auth token from response
- Use token for subsequent API calls
Would you prefer the API-based approach?
Debugging with Console Logs
When actions don't work as expected:
1. browser_start_console_capture types=["error","warn"]
2. Retry the failing action
3. browser_get_console_logs type=error
4. Report findings and adjust approach
Example workflow:
Claude: The click didn't seem to work. Let me check for JavaScript errors...
[browser_start_console_capture types=["error","warn"]]
[browser_click selector="//button[@id='submit']"]
[browser_get_console_logs type=error]
Result: Error: "Cannot read property 'validate' of undefined" at form.js:42
Claude: Found a JavaScript error - the page has a bug in its validation code.
Let me try waiting for the form to fully load first...
[browser_wait selector="//form[@data-ready='true']"]
Session Persistence for Auth
When working with authenticated sites:
1. Complete login manually or via automation
2. browser_export_session - save for later
3. Close browser, do other work
4. browser_open, browser_import_session - restore auth
Example workflow:
Claude: Let me save your authenticated session for later use...
[browser_export_session]
Session exported with 5 cookies and current URL.
Claude: I'll save this to a file so you can reuse it:
[browser_save_session path="/tmp/myapp-session.json"]
--- Later ---
Claude: Let me restore your previous session...
[browser_open]
[browser_load_session path="/tmp/myapp-session.json"]
Session restored - you're logged in as user@example.com
When to Use Browser vs HTTP Request
| Scenario | Recommendation |
|---|---|
| Simple API with known endpoints | Use Core.Net.HttpRequest |
| Login flow → need auth cookies | Use browser, then export session |
| Complex JavaScript-rendered content | Use browser automation |
| Form with CSRF tokens | Use browser (handles tokens automatically) |
| Scraping paginated data | Start with browser, switch to API if found |
| File downloads | Use browser for auth, API for download |
Decision workflow:
- Start with network capture
- If clean REST API found → suggest HTTPRequest
- If complex auth/CSRF → stick with browser
- If both needed → browser for auth, API for data
More from robomotionio/agent-skills
running-flow
Validates locally (`robomotion validate`) then executes a Robomotion flow on a robot (`robomotion run <flow-dir>`), tailing the agent-mode JSONL session log to drive a validate → run → observe → fix loop with bounded retries. Use when the user says "run the flow", "start on robot X", "trigger this on the robot", "test on a robot", or "deploy and run".
18creating-flow
Creates Robomotion automation flows with the @robomotion/sdk TypeScript builder. Owns the full lifecycle: requirements → plan → build → validate → deploy. Also use when the user has a plan ready and wants the flow code written.
18validating-flow
Validates a compiled Robomotion flow against pspec schemas via `robomotion validate <flow-dir>`. Exits 0 on clean, 1 on error with node-by-node report on stderr; no stdout output (composes in shell pipelines). Use when the user says "validate this flow", "check the flow", "is this correct", or before running. Does NOT run the flow — for behavioral tests use testing-flow.
18testing-flow
Runs and authors behavioral tests for Robomotion flows using @robomotion/sdk/testing. Use when the user says "test the flow", "write tests for main.ts", "mock this service", "check branch coverage", or when a flow needs regression tests before a change. For pspec schema validation, use validating-flow instead.
18reversing-network
Reverses the network traffic behind a browser flow to replace slow browser automation with direct HTTP requests. Use when the user says "this browser flow is too slow", "find the API this page uses", "convert to HTTP", "can we call the API instead", or when `exploring-browser` reveals a clean REST/GraphQL endpoint.
18searching-packages
Finds Robomotion packages, nodes, and templates via the `robomotion` CLI, which is backed by Bleve full-text search with fuzzy matching and semantic expansion. Use when the user asks to find a package, explore templates, discover what nodes exist for a task, or check exact property names before writing a flow.
18