browser-automation
Installation
SKILL.md
Browser Automation
Available Tools
- browser_act(instruction, starting_url?): Execute browser actions using natural language (click, type, scroll, select). Use
starting_urlto navigate to a page and act in a single call. - browser_get_page_info(url?, text?, tables?, links?): Get page structure and DOM data (fast, no AI). Use
urlto navigate first;text=Truefor full text,tables=Truefor table data,links=Truefor all links. - browser_manage_tabs(action, tab_index?, url?): Switch, close, or create browser tabs
- browser_save_screenshot(filename): Save current page screenshot to workspace
When to Use
Use browser automation when the task genuinely requires it:
- UI interactions: Filling forms, clicking buttons, navigating multi-step workflows
- Login-required pages: Accessing content behind authentication that APIs cannot reach
- Dynamic/JS-heavy pages: Content rendered client-side that plain HTTP requests can't capture
- Human-like browsing needed: Sites that block bots or require realistic interaction patterns
- Scraping structured data: When no API exists and the data must be extracted from rendered pages
Prefer web search or url_fetcher for general information lookup, news, or publicly accessible pages — browser automation is slower and heavier. Reserve it for tasks where simpler tools are insufficient.
Tool Selection
browser_act: UI interactions (click, type, scroll, form fill). Usestarting_urlto open a page and act in one call.browser_get_page_info: Fast page structure check and optional content extraction (<300ms). Useurlto navigate first.browser_manage_tabs: Switch/close/create tabs (view tabs viaget_page_info)browser_save_screenshot: Save milestone screenshots (search results, confirmations, key data)
browser_act Best Practice
- Combine up to 3 predictable steps: "1. Type 'laptop' in search 2. Click search button 3. Click first result"
- Use
starting_urlwhen opening a fresh page:browser_act(instruction='Search for laptops', starting_url='https://amazon.com') - On failure: check the screenshot to see current state, then retry from that point
- For visual creation (diagrams, drawings), prefer code/text input methods over mouse interactions
browser_get_page_info Best Practice
- Use
urlto navigate and inspect in one call:browser_get_page_info(url='https://example.com', tables=True) - Use
text=Trueto get full page text content (useful for reading article text) - Use
tables=Trueto extract structured table data from the page - Use
links=Trueto get all links on the page (up to 200)
UI Guidance (from tools-config)
Tool Selection:
- browser_act: UI interactions (click, type, scroll, form fill). Use starting_url to navigate and act in one call.
- browser_get_page_info: Fast DOM inspection (<300ms). Use url param to navigate first; text/tables/links params for content extraction.
- browser_manage_tabs: Switch, close, or create tabs.
- browser_save_screenshot: Save milestone screenshots to workspace for documents.
browser_act Best Practice:
- Combine up to 3 predictable steps: "1. Type 'laptop' in search 2. Click search button 3. Click first result"
- Use starting_url when opening a fresh page: browser_act(instruction='...', starting_url='https://...')
- On failure: check the screenshot to see current state, then retry from that point
browser_get_page_info Best Practice:
- Use url param to navigate and inspect in one call: browser_get_page_info(url='https://...', tables=True)
- Use text=True for full page text, tables=True for table data, links=True for all page links
Weekly Installs
37
Repository
aws-samples/sam…gentcoreGitHub Stars
151
First Seen
6 days ago
Security Audits