cursor-ide-browser-skills
Cursor IDE Browser Automation
Browser automation tool for Cursor IDE using MCP (Model Context Protocol) server cursor-ide-browser and accessibility snapshots for precise element interaction.
Core Mechanism
Accessibility Snapshot First: Always get a snapshot before interacting with elements. The snapshot provides structured page information with element references (ref) needed for all interactions.
// Standard workflow
browser_navigate(url="https://example.com")
browser_snapshot() // Required: Get element references
browser_click(element="Button", ref="ref-from-snapshot")
Essential Workflow
- Navigate to target page
- Snapshot to get element references (required before any interaction)
- Convert to Markdown (⭐ Recommended) for easier searching, locating and reading
- Search with grep in md to find information or locate interactive elements
- Interact using refs from snapshot
- Wait for dynamic content if needed
- Verify with screenshots or console messages
Quick example:
browser_navigate(url="https://example.com")
browser_snapshot() // Creates .log file
mcp_snapshot-query_convert_to_markdown(file_path="snapshot.log")
grep(pattern="button|登录", path="snapshot.md") // Find elements
browser_click(element="Login", ref="ref-from-grep-results")
Key Tools
Navigation:
browser_navigate(url, position?)- Navigate to URLbrowser_navigate_back()- Go back
Page Information:
browser_snapshot()- Required before interactions - Get accessibility tree with element refsbrowser_take_screenshot(fullPage?, filename?)- Capture visualbrowser_console_messages()- Get console logsbrowser_network_requests()- Get network activity
Element Interaction:
browser_click(element, ref, doubleClick?, button?, modifiers?)- Click elementbrowser_type(element, ref, text, submit?, slowly?)- Type textbrowser_hover(element, ref)- Hoverbrowser_select_option(element, ref, values)- Select dropdownbrowser_press_key(key)- Press key (supports PageDown, PageUp, ArrowDown, ArrowUp, Space, End, Home for scrolling)
Synchronization:
browser_wait_for(text?, textGone?, time?)- Wait for text or time
Tab Management:
browser_tabs(action, index?, position?)- Manage tabs (list/new/close/select)
Element References
**element**: Human-readable description (for permission confirmation)**ref**: Technical reference from snapshot (required for interaction)- Refs are page-state specific - get a new snapshot after navigation or page changes
Snapshot Files
Snapshots are automatically saved as YAML files:
- Location:
C:\Users\{username}\.cursor\browser-logs\snapshot-{timestamp}.log - Format: YAML accessibility tree with
role,ref,name,children - Usage: Extract
refvalues for element interactions
Querying Snapshots
⭐ Recommended Workflow: Convert to Markdown + Grep
Best practice for finding information and locating interactive elements:
- Get snapshot → Creates
.logfile - Convert to Markdown → More readable format with structured content
- Use grep → Fast text search across the entire document
- Extract refs → Use found refs for interactions
// Step 1: Get page snapshot
browser_snapshot() // Creates: snapshot-2026-01-10T23-43-30-351Z.log
// Step 2: Convert to Markdown (RECOMMENDED)
mcp_snapshot-query_convert_to_markdown(
file_path="snapshot-2026-01-10T23-43-30-351Z.log",
include_ref=true
) # save to snapshot-2026-01-10T23-43-30-351Z.md
// Step 3: Search with grep (much easier than querying raw YAML)
grep(pattern="搜索|button|登录", path="snapshot.md", -i=true)
grep(pattern="^\\[.*\\]\\(ref-|^\\*\\*.*\\*\\* `ref-", path="snapshot.md") // Find all links/buttons
// Step 4: Use found refs for interaction
browser_click(element="Login button", ref="ref-found-from-grep")
Why this workflow is preferred:
- ✅ More readable: Markdown format is human-friendly
- ✅ Faster search:
grepis more efficient than parsing YAML - ✅ Better context: See surrounding content with
-Cflag - ✅ Easy element discovery: Links and buttons clearly formatted
- ✅ Preserves refs: All element references included for interaction
Alternative: Direct Query Tools
For programmatic element finding, use snapshot-query MCP tools:
Command line:
browser_snapshot() # Generate snapshot
uvx snapshot-query snapshot.log find-name "search" # Find element
MCP tools:
mcp_snapshot-query_find_by_name(file_path="snapshot.log", name="搜索")
mcp_snapshot-query_find_by_role(file_path="snapshot.log", role="button")
mcp_snapshot-query_find_by_text(file_path="snapshot.log", text="登录")
mcp_snapshot-query_find_by_regex(file_path="snapshot.log", pattern="\\d+\\s*ft", field="name")
mcp_snapshot-query_find_by_name_bm25(file_path="snapshot.log", name="search query", top_k=5)
mcp_snapshot-query_count_elements(file_path="snapshot.log")
mcp_snapshot-query_get_element_path(file_path="snapshot.log", ref="ref-xxx")
mcp_snapshot-query_extract_all_refs(file_path="snapshot.log")
Integrated workflow:
browser_snapshot() // Creates snapshot file
// Query snapshot to find element ref
const result = mcp_snapshot-query_find_by_name(file_path="snapshot.log", name="Login")
browser_click(element="Login", ref=result.ref) // Use ref from query
⭐ snapshot-query works with OCR results too:
The snapshot-query tools can process OCR results from fast-paddleocr-mcp. After OCR processing, you get a .snapshot.log file that can be queried just like browser snapshots:
// OCR generates webpage.png.snapshot.log
mcp_fast-paddleocr-mcp_ocr_image(image_path="webpage.png", language="ch")
// Query OCR results with snapshot-query
mcp_snapshot-query_find_by_text(
file_path="webpage.png.snapshot.log",
text="8 ft",
case_sensitive=false
)
// Use regex to find measurements
mcp_snapshot-query_find_by_regex(
file_path="webpage.png.snapshot.log",
pattern="\\d+\\s*ft|cm|meters?",
field="name"
)
// Semantic search for better results
mcp_snapshot-query_find_by_name_bm25(
file_path="webpage.png.snapshot.log",
name="height measurement",
top_k=5
)
// Convert to Markdown for analysis
mcp_snapshot-query_convert_to_markdown(
file_path="webpage.png.snapshot.log",
include_ref=true
)
See references/snapshot-query.md for complete snapshot-query documentation.
Common Patterns
Login flow:
browser_navigate(url="https://example.com/login")
browser_snapshot()
// Find username input ref from snapshot
browser_type(element="Username", ref="ref-username", text="user")
// Find password input ref from snapshot
browser_type(element="Password", ref="ref-password", text="pass")
// Find login button ref from snapshot
browser_click(element="Login", ref="ref-login-btn")
browser_wait_for(text="Welcome")
Search and extract (with Markdown workflow):
browser_navigate(url="https://www.baidu.com/s?wd=哈梅内伊有几个孩子")
browser_snapshot() // Creates snapshot.log
// Convert to Markdown for easier searching
mcp_snapshot-query_convert_to_markdown(
file_path="snapshot.log",
include_ref=true
)
// Search for information using grep
grep(pattern="六名|6个|子女", path="snapshot.md", -i=true, -C=3)
// Find interactive elements (links/buttons)
grep(pattern="^\\[.*\\]\\(ref-|^\\*\\*.*\\*\\* `ref-", path="snapshot.md")
// Click on found link using ref
browser_click(element="Article link", ref="ref-45py92vjdrs")
browser_wait_for(text="Results")
browser_take_screenshot(filename="results.png")
Debug page issues:
browser_snapshot()
browser_console_messages() // Check for errors
browser_network_requests() // Check failed requests
Scrolling web pages:
browser_press_key("PageDown") // Scroll down one page
browser_press_key("PageUp") // Scroll up one page
browser_press_key("ArrowDown") // Scroll down line by line
browser_press_key("ArrowUp") // Scroll up line by line
browser_press_key("Space") // Scroll down one screen
browser_press_key("End") // Scroll to bottom
browser_press_key("Home") // Scroll to top
browser_wait_for(time=1) // Wait after scrolling for content to load
OCR processing with fast-paddleocr-mcp:
// Take screenshot of webpage
browser_take_screenshot(filename="webpage.png", fullPage=false)
// Process with OCR (generates .md and .snapshot.log files)
mcp_fast-paddleocr-mcp_ocr_image(
image_path="webpage.png",
language="ch" // Use "ch" for Chinese+English, "en" for English only
)
// Query OCR results with snapshot-query
mcp_snapshot-query_find_by_text(
file_path="webpage.png.snapshot.log",
text="tallest",
case_sensitive=false
)
// Use BM25 semantic search for better results
mcp_snapshot-query_find_by_name_bm25(
file_path="webpage.png.snapshot.log",
name="height tallest person",
top_k=5
)
// Convert OCR snapshot to Markdown for easier analysis
mcp_snapshot-query_convert_to_markdown(
file_path="webpage.png.snapshot.log",
include_ref=true
)
Cross-verification workflow:
// Navigate to multiple sources for verification
browser_navigate(url="https://source1.com/article")
browser_snapshot()
// Extract information from source 1
browser_navigate(url="https://source2.com/article")
browser_snapshot()
// Extract information from source 2
// Compare and verify information consistency
// Prefer authoritative sources (Wikipedia, official records, etc.)
Important Notes
- Always snapshot before interaction - Refs are required and page-specific
- ⭐ Convert to Markdown first - Use
convert_to_markdown+grepfor finding information and elements (much easier than querying raw YAML) - Wait for dynamic content - Use
browser_wait_for()for async operations - Refs expire - Get new snapshot after navigation or page changes
- Multi-tab support - Use
viewIdparameter orbrowser_tabs()to manage tabs - Position control - Use
position="side"when user mentions side panel - OCR limitations - OCR may merge adjacent text (e.g., "otherreliablesourcesccordingtoG"). Key information is usually extracted correctly, but verify important details
- Cross-verification - For critical information, verify across multiple authoritative sources (Wikipedia, official records, etc.)
- Tool combination - Combine browser automation + OCR + snapshot-query for comprehensive web content analysis
Best Practices & Lessons Learned
Workflow Optimization
- Standard workflow: Navigate → Snapshot → Convert to Markdown → Search → Interact
- OCR workflow: Screenshot → OCR → Query with snapshot-query → Extract information
- Verification workflow: Multiple sources → Extract → Compare → Verify consistency
Tool Integration
- Browser + OCR: Use
browser_take_screenshot()+fast-paddleocr-mcpto extract text from visual content - OCR + snapshot-query: OCR generates
.snapshot.logfiles that can be queried with all snapshot-query tools - Markdown + grep: Convert snapshots/OCR results to Markdown for easier searching
Key Insights
- snapshot-query is universal: Works with both browser snapshots and OCR results
- Markdown conversion is recommended: Much easier to search and read than raw YAML
- BM25 semantic search: Use
find_by_name_bm25()for better relevance when exact matches are unclear - Cross-verification: Always verify critical information from multiple authoritative sources
- OCR accuracy: Works well for key information but may merge adjacent text - verify important details
Detailed Reference
- Complete tool reference: See references/tools.md for all tools with full parameters
- Examples and patterns: See references/examples.md for detailed workflows
- Snapshot file format: See references/snapshot-format.md for YAML structure details
- Snapshot querying: See references/snapshot-query.md for querying snapshot files