browserpilot-executor
BrowserPilot Executor API
Overview
BrowserPilot Executor provides comprehensive browser automation capabilities through HTTP APIs. You can control browser navigation, interact with page elements, extract data, and analyze page structure.
API Base URL: http://127.0.0.1:8080/api/v1/executor
Authentication: Use X-BrowserWing-Key: <api-key> header or Authorization: Bearer <token>
Core Capabilities
- Page Navigation: Navigate to URLs, go back/forward, reload
- Element Interaction: Click, type, select, hover on page elements
- Data Extraction: Extract text, attributes, values from elements
- Semantic Analysis: Get semantic tree to understand page structure
- Advanced Operations: Screenshot, JavaScript execution, keyboard input
- Batch Processing: Execute multiple operations in sequence
API Endpoints
1. Discover Available Commands
IMPORTANT: Always call this endpoint first to see all available commands and their parameters.
curl -X GET 'http://127.0.0.1:8080/api/v1/executor/help'
Response: Returns complete list of all commands with parameters, examples, and usage guidelines.
Query specific command:
curl -X GET 'http://127.0.0.1:8080/api/v1/executor/help?command=extract'
2. Get Semantic Tree
CRITICAL: Always call this after navigation to understand page structure and get element indices.
curl -X GET 'http://127.0.0.1:8080/api/v1/executor/semantic-tree'
Response Example:
{
"success": true,
"tree_text": "Clickable Element [1]: Login Button\nInput Element [1]: Email\nInput Element [2]: Password"
}
Use Cases:
- Understand what interactive elements are on the page
- Get element indices for reliable identification
- See element labels and roles
3. Common Operations
Navigate to URL
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/navigate' \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com"}'
Click Element
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/click' \
-H 'Content-Type: application/json' \
-d '{"identifier": "[1]"}'
Identifier formats: [1], #button-id, .class-name, Login (text), Clickable Element [1]
Type Text
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/type' \
-H 'Content-Type: application/json' \
-d '{"identifier": "Input Element [1]", "text": "user@example.com"}'
Extract Data
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/extract' \
-H 'Content-Type: application/json' \
-d '{
"selector": ".product-item",
"fields": ["text", "href"],
"multiple": true
}'
Wait for Element
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/wait' \
-H 'Content-Type: application/json' \
-d '{"identifier": ".loading", "state": "hidden", "timeout": 10}'
Batch Operations
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/batch' \
-H 'Content-Type: application/json' \
-d '{
"operations": [
{"type": "navigate", "params": {"url": "https://example.com"}, "stop_on_error": true},
{"type": "click", "params": {"identifier": "[1]"}, "stop_on_error": true},
{"type": "type", "params": {"identifier": "[1]", "text": "query"}, "stop_on_error": true}
]
}'
Instructions
Step-by-step workflow:
-
Discover commands: Call
GET /helpto see all available operations and their parameters (do this first if unsure). -
Navigate: Use
POST /navigateto open the target webpage. -
Analyze page: Call
GET /semantic-treeto understand page structure and get element indices. -
Interact: Use element indices (like
[1],Input Element [1]) or CSS selectors to:- Click elements:
POST /click - Input text:
POST /type - Select options:
POST /select - Wait for elements:
POST /wait
- Click elements:
-
Extract data: Use
POST /extractto get information from the page. -
Present results: Format and show extracted data to the user.
Complete Example
User Request: "Search for 'laptop' on example.com and get the first 5 results"
Your Actions:
- Navigate to search page:
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/navigate' \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com/search"}'
- Get page structure to find search input:
curl -X GET 'http://127.0.0.1:8080/api/v1/executor/semantic-tree'
Response shows: Input Element [1]: Search Box
- Type search query:
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/type' \
-H 'Content-Type: application/json' \
-d '{"identifier": "Input Element [1]", "text": "laptop"}'
- Press Enter to submit:
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/press-key' \
-H 'Content-Type: application/json' \
-d '{"key": "Enter"}'
- Wait for results to load:
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/wait' \
-H 'Content-Type: application/json' \
-d '{"identifier": ".search-results", "state": "visible", "timeout": 10}'
- Extract search results:
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/extract' \
-H 'Content-Type: application/json' \
-d '{
"selector": ".result-item",
"fields": ["text", "href"],
"multiple": true
}'
- Present the extracted data:
Found 15 results for 'laptop':
1. Gaming Laptop - $1299 (https://...)
2. Business Laptop - $899 (https://...)
...
Key Commands Reference
Navigation
POST /navigate- Navigate to URLPOST /go-back- Go back in historyPOST /go-forward- Go forward in historyPOST /reload- Reload current page
Element Interaction
POST /click- Click element (supports: CSS selector, semantic index[1], text content)POST /type- Type text into input (supports:Input Element [1], CSS selector)POST /select- Select dropdown optionPOST /hover- Hover over elementPOST /wait- Wait for element state (visible, hidden, enabled)POST /press-key- Press keyboard key (Enter, Tab, Ctrl+S, etc.)
Data Extraction
POST /extract- Extract data from elements (supports multiple elements, custom fields)POST /get-text- Get element text contentPOST /get-value- Get input element valueGET /page-info- Get page URL and titleGET /page-text- Get all page textGET /page-content- Get full HTML
Page Analysis
GET /semantic-tree- Get semantic tree (⭐ ALWAYS call after navigation)GET /clickable-elements- Get all clickable elementsGET /input-elements- Get all input elements
Advanced
POST /screenshot- Take page screenshot (base64 encoded)POST /evaluate- Execute JavaScript codePOST /batch- Execute multiple operations in sequencePOST /scroll-to-bottom- Scroll to page bottomPOST /resize- Resize browser window
Element Identification
You can identify elements using:
-
Semantic Index (Recommended):
[1],[2],Clickable Element [1],Input Element [2]- Most reliable method
- Get indices from
/semantic-treeendpoint - Example:
"identifier": "[1]"or"identifier": "Input Element [1]"
-
CSS Selector:
#id,.class,button[type="submit"]- Standard CSS selectors
- Example:
"identifier": "#login-button"
-
Text Content:
Login,Sign Up,Submit- Searches buttons and links with matching text
- Example:
"identifier": "Login"
-
XPath:
//button[@id='login']- XPath expressions
- Example:
"identifier": "//button[@id='login']"
-
ARIA Label: Elements with
aria-labelattribute- Automatically searched
Guidelines
Before starting:
- Call
GET /helpif you're unsure about available commands or their parameters - Ensure browser is started (if not, it will auto-start on first operation)
During automation:
- Always call
/semantic-treeafter navigation to get page structure - Prefer semantic indices (like
[1]) over CSS selectors for reliability - Use
/waitfor dynamic content that loads asynchronously - Check element states before interaction (visible, enabled)
- Use
/batchfor multiple sequential operations to improve efficiency
Error handling:
- If operation fails, check element identifier and try different format
- For timeout errors, increase timeout value
- If element not found, call
/semantic-treeagain to refresh page structure - Explain errors clearly to user with suggested solutions
Data extraction:
- Use
fieldsparameter to specify what to extract:["text", "href", "src"] - Set
multiple: trueto extract from multiple elements - Format extracted data in a readable way for user
Complete Workflow Example
Scenario: User wants to login to a website
User: "Please log in to example.com with username 'john' and password 'secret123'"
Your Actions:
Step 1: Navigate to login page
POST http://127.0.0.1:8080/api/v1/executor/navigate
{"url": "https://example.com/login"}
Step 2: Get page structure
GET http://127.0.0.1:8080/api/v1/executor/semantic-tree
Response:
Input Element [1]: Username
Input Element [2]: Password
Clickable Element [1]: Login Button
Step 3: Enter username
POST http://127.0.0.1:8080/api/v1/executor/type
{"identifier": "Input Element [1]", "text": "john"}
Step 4: Enter password
POST http://127.0.0.1:8080/api/v1/executor/type
{"identifier": "Input Element [2]", "text": "secret123"}
Step 5: Click login button
POST http://127.0.0.1:8080/api/v1/executor/click
{"identifier": "Clickable Element [1]"}
Step 6: Wait for login success (optional)
POST http://127.0.0.1:8080/api/v1/executor/wait
{"identifier": ".welcome-message", "state": "visible", "timeout": 10}
Step 7: Inform user
"Successfully logged in to example.com!"
Batch Operation Example
Scenario: Fill out a form with multiple fields
Instead of making 5 separate API calls, use one batch operation:
curl -X POST 'http://127.0.0.1:8080/api/v1/executor/batch' \
-H 'Content-Type: application/json' \
-d '{
"operations": [
{
"type": "navigate",
"params": {"url": "https://example.com/form"},
"stop_on_error": true
},
{
"type": "type",
"params": {"identifier": "#name", "text": "John Doe"},
"stop_on_error": true
},
{
"type": "type",
"params": {"identifier": "#email", "text": "john@example.com"},
"stop_on_error": true
},
{
"type": "select",
"params": {"identifier": "#country", "value": "United States"},
"stop_on_error": true
},
{
"type": "click",
"params": {"identifier": "#submit"},
"stop_on_error": true
}
]
}'
Best Practices
- Discovery first: If unsure, call
/helpor/help?command=<name>to learn about commands - Structure first: Always call
/semantic-treeafter navigation to understand the page - Use semantic indices: They're more reliable than CSS selectors (elements might have dynamic classes)
- Wait for dynamic content: Use
/waitbefore interacting with elements that load asynchronously - Batch when possible: Use
/batchfor multiple sequential operations - Handle errors gracefully: Provide clear explanations and suggestions when operations fail
- Verify results: After operations, check if desired outcome was achieved
Common Scenarios
Form Filling
- Navigate to form page
- Get semantic tree to find input elements
- Use
/typefor each field:Input Element [1],Input Element [2], etc. - Use
/selectfor dropdowns - Click submit button
Data Scraping
- Navigate to target page
- Wait for content to load with
/wait - Use
/extractwith CSS selector andmultiple: true - Specify fields to extract:
["text", "href", "src"]
Search Operations
- Navigate to search page
- Get semantic tree to locate search input
- Type search query into input
- Press Enter or click search button
- Wait for results
- Extract results data
Login Automation
- Navigate to login page
- Get semantic tree
- Type username:
Input Element [1] - Type password:
Input Element [2] - Click login button:
Clickable Element [1] - Wait for success indicator
Important Notes
- Browser must be running (it will auto-start on first operation if needed)
- Operations are executed on the currently active browser tab
- Semantic tree updates after each navigation and click operation
- All timeouts are in seconds
- Use
wait_visible: true(default) for reliable element interaction - Replace
127.0.0.1:8080with actual API host address - Authentication required: use
X-BrowserWing-Keyheader or JWT token
Troubleshooting
Element not found:
- Call
/semantic-treeto see available elements - Try different identifier format (semantic index, CSS selector, text)
- Check if page has finished loading
Timeout errors:
- Increase timeout value in request
- Check if element actually appears on page
- Use
/waitwith appropriate state before interaction
Extraction returns empty:
- Verify CSS selector matches target elements
- Check if content has loaded (use
/waitfirst) - Try different extraction fields or type
Quick Reference
# Discover commands
GET 127.0.0.1:8080/api/v1/executor/help
# Navigate
POST 127.0.0.1:8080/api/v1/executor/navigate {"url": "..."}
# Get page structure
GET 127.0.0.1:8080/api/v1/executor/semantic-tree
# Click element
POST 127.0.0.1:8080/api/v1/executor/click {"identifier": "[1]"}
# Type text
POST 127.0.0.1:8080/api/v1/executor/type {"identifier": "[1]", "text": "..."}
# Extract data
POST 127.0.0.1:8080/api/v1/executor/extract {"selector": "...", "fields": [...], "multiple": true}
Response Format
All operations return:
{
"success": true,
"message": "Operation description",
"timestamp": "2026-01-15T10:30:00Z",
"data": {
// Operation-specific data
}
}
Error response:
{
"error": "error.operationFailed",
"detail": "Detailed error message"
}