browse
Browse: Token-Efficient Web Browsing
Guide AI agents through web browsing tasks using the cheapest tool that gets the job done. Every browsing action has a token cost - this skill minimizes it through progressive disclosure, smart format selection, and backend-aware strategies.
Target versions (May 2026):
- Lightpanda: 0.2.9
- @playwright/mcp: 0.0.72
- agent-browser: 0.26.0
When to use
- Reading a web page, article, or documentation site
- Extracting structured data from a page (prices, tables, metadata)
- Filling forms, clicking buttons, or navigating multi-step flows
- Scraping content from JavaScript-rendered pages (SPAs)
- Browsing behind authentication (login flows)
- Any task where the agent needs to see or interact with a live web page
When NOT to use
- E2E test automation, test writing, or test debugging - use testing
- Building or debugging MCP servers (including browser MCP servers) - use mcp
- Network configuration, DNS, reverse proxies - use networking
- Fetching API endpoints or REST calls - use curl/fetch directly
- Static file downloads - use curl or wget
- Web scraping specifically for RAG pipelines or training data - use ai-ml for the pipeline
Tool Selection
Detect what's available and pick the cheapest tool that handles the task.
Detection
- MCP browsing tools: look for
goto,navigate,markdown,semantic_tree,browser_navigate,browser_snapshotin the available tool list - CLI tools: check
lightpanda,agent-browserin PATH - Built-in fetch: WebFetch tool (Claude Code) or platform equivalent
- Fallback: curl via shell
Decision matrix
| Task | No JS needed | JS needed, read-only | JS needed, interactive |
|---|---|---|---|
| Best | WebFetch / curl | Lightpanda fetch | Lightpanda MCP tools |
| Good | Lightpanda fetch | MCP markdown tool | agent-browser CLI |
| Fallback | curl | Playwright MCP | Playwright MCP |
If the page works without JavaScript, don't use a browser. If you only need to read content, don't use interactive tools. Escalate only when the cheaper option fails.
Task modes
| Mode | Use when | First tool |
|---|---|---|
| Static fetch | Content is in HTML | WebFetch, curl, or Lightpanda fetch |
| JS read-only | Content requires rendering | Lightpanda or Playwright markdown |
| Interactive | Click, fill, or navigate statefully | MCP browser tools |
| Authenticated | User-approved account context is required | Existing authenticated browser session |
| Screenshot | Visual layout matters | Browser screenshot after DOM snapshot |
| Structured extraction | Tables, JSON-LD, prices, metadata | API, JSON-LD, DOM selectors, then browser |
Tool availability check: before starting, verify what's available. If the best tool for the task isn't present, skip straight to the next tier rather than failing mid-workflow.
Performance
- Prefer official APIs, sitemaps, or static fetches before launching a browser.
- Extract only required page regions; avoid dumping full DOMs, screenshots, or network logs into context.
- Reuse browser sessions for multi-step flows, but clear cookies/storage between unrelated accounts or tenants.
- Use screenshots only when visual layout or rendered state matters.
Best Practices
- Use stable selectors and semantic roles before brittle CSS paths.
- Record URL and access date for facts likely to change.
- Clear cookies/storage between unrelated accounts or tenants.
- Do not automate destructive account actions unless the user names the exact action and target.
Workflow
Step 1: Assess the task
Before touching any tool, answer these in order - each answer narrows the tool choice:
- Read or interact? Read-only -> skip to Step 2. Interactive -> go to Step 3/4.
- Static or dynamic? View page source or check URL patterns - if the content is in the HTML, it's static. SPA frameworks (React, Vue, Angular) need JS rendering.
- Single page or multi-step? Multi-step flows need session persistence (MCP or serve mode).
- What output format? Markdown for human reading, structured data / JSON-LD for extraction, semantic tree for element discovery, links for crawl planning.
Step 2: Try the cheapest path first
Static content (docs, articles, blogs):
# Option A: built-in fetch (lowest overhead, no setup)
# Use WebFetch tool with the URL directly
# Option B: Lightpanda CLI (better stripping, selector waits)
lightpanda fetch --dump markdown --strip-mode full <url>
# Option C: curl (always available, raw HTML only)
curl -sL <url>
JS-rendered content (SPAs, dashboards):
# Lightpanda CLI with wait
lightpanda fetch --dump markdown --strip-mode full --wait-until networkidle <url>
# With selector wait for specific content
lightpanda fetch --dump markdown --wait-selector ".main-content" <url>
Interactive tasks (forms, clicks, multi-step): use MCP tools or agent-browser CLI (Step 3).
Step 3: Navigate and extract
With MCP browsing tools (Lightpanda MCP or Playwright MCP):
Navigate first, then extract using the cheapest format:
| Format | Tool | Tokens (typical) | Use when |
|---|---|---|---|
| Semantic tree | semantic_tree / browser_snapshot |
~200-500 | Finding elements to interact with |
| Markdown | markdown |
~500-2000 | Reading text content |
| Links only | links |
~100-300 | Finding URLs to follow |
| Structured data | structuredData / structured_data |
~100-500 | Getting metadata (OpenGraph, JSON-LD) |
| Interactive elements | interactiveElements / interactive_elements |
~200-400 | Finding clickable/fillable elements |
| Full HTML | page-html resource | ~5000-50000 | Last resort only |
Following links to find data: if the initial extraction doesn't contain the target content
(e.g., the page uses images or links to a separate document), extract the page's links first
using the links tool or markdown output, identify the relevant link, and fetch that instead.
Don't re-fetch the whole page - follow the specific link to the actual data.
With agent-browser CLI:
agent-browser open <url>
agent-browser snapshot -i # interactive element refs (@e1, @e2...)
agent-browser click @e3 # click by ref
agent-browser fill @e5 "query" # fill input by ref
With Lightpanda CLI (non-interactive):
lightpanda fetch --dump semantic_tree <url> # structure
lightpanda fetch --dump markdown --strip-mode full <url> # content
lightpanda fetch --dump markdown --wait-until networkidle <url> # dynamic content
Step 4: Interact (when needed)
For multi-step flows (login, form submission, navigation):
- Get interactive elements first - use
interactive_elementsorsemantic_treeto find targets without loading the full DOM - Act on specific elements - click, fill, select using element identifiers
- Re-extract after each action - page state changes; get a fresh view
- Wait for navigation - after clicks that trigger page loads, wait before extracting
MCP interaction pattern:
1. goto(url)
2. interactive_elements() - find what to click/fill
3. click(id) or fill(id, value)
4. semantic_tree() - verify state changed
5. Repeat 2-4 as needed
Step 5: Process results
Single-page reads: if you extracted more than needed, pull out the relevant section before returning it. Don't dump an entire page of markdown when the user asked about one paragraph.
For documentation sites, target the content container. Most docs use predictable selectors:
# Try common content selectors in order of specificity
lightpanda fetch --dump markdown --wait-selector "article" <url>
lightpanda fetch --dump markdown --wait-selector "main" <url>
lightpanda fetch --dump markdown --wait-selector ".content" <url>
If the full page was already fetched, extract the relevant section from the markdown output rather than re-fetching - search for headings or known section titles.
Multi-step workflows:
- Cache extraction results rather than re-fetching the same page
- Summarize intermediate pages (e.g., search results) instead of returning raw content
- Discard navigation/boilerplate content before putting results in context
Step 6: Handle failures
When a tool fails, escalate to the next tier - don't retry the same tool blindly.
| Failure | Likely cause | Action |
|---|---|---|
| Empty/broken content | JS didn't render | Escalate: WebFetch -> Lightpanda -> Playwright MCP |
| 403 / blocked | Bot detection | Try with --user-agent-suffix (Lightpanda) or Playwright (real browser UA) |
| Timeout | Heavy page / slow network | Increase wait timeout, try --wait-selector on specific element |
| Connection refused | Wrong port / service down | Verify URL, check if site requires VPN or local network |
| SSL error | Cert issue or MITM | Check cert validity, do not bypass without user confirmation |
After login failures: re-check the form field selectors - SPAs frequently change element IDs
between deploys. Use interactive_elements to get fresh selectors rather than hardcoding.
Saving fetched content: for file downloads or large extractions, write results to a local file rather than keeping everything in context:
lightpanda fetch --dump markdown --strip-mode full <url> > extracted.md
For binary downloads (PDFs, images) after auth, extract the session cookie from the browser
context and hand it to curl. With Playwright MCP or Lightpanda MCP, use evaluate to read
document.cookie, then:
curl -L -o report.pdf -b "session=<value>; csrf=<value>" \
-H "Referer: https://internal.example.com/reports" <pdf-url>
Alternative: trigger the browser's native download via evaluate
(document.querySelector('a.download').click()) and let the headless session write to its
download directory - avoids moving the cookie out of the browser entirely. This is the only
working path for blob: URLs and data: URIs - they are in-memory browser references with
no fetchable origin, so curl cannot resolve them; let the page itself resolve the blob via a
click or read it with evaluate and FileReader.readAsDataURL to extract the bytes.
Token Efficiency
Progressive disclosure
Start with the cheapest representation. Escalate only when insufficient.
Level 0: URL only (0 tokens) - sometimes the URL itself answers the question
Level 1: Structured data (~100-300) - metadata, navigation links
Level 2: Semantic tree (~200-500) - page structure, interactive elements
Level 3: Markdown (~500-2000) - readable content
Level 4: Full HTML (~5000-50000) - complex parsing, last resort
Strip unnecessary content
With Lightpanda CLI, always use --strip-mode:
js- remove script tagscss- remove stylesheetsui- remove images, video, SVGfull- all of the above (default for content extraction)
Scope extraction
Don't dump the whole page when you need one section:
lightpanda fetch --dump markdown --wait-selector "#pricing-table" <url>
With MCP semantic tree, limit depth:
Tool: semantic_tree
Args: { maxDepth: 3 } - top 3 levels only
Extract structured data from pages
When you need specific data (prices, tables, metadata) rather than full page content:
- Try
structured_data/structuredDatafirst - many sites embed JSON-LD or OpenGraph - If no structured data exists, use
evaluate/evalto run JavaScript extraction:
Tool: evaluate
Args: { expression: "JSON.stringify([...document.querySelectorAll('.product')].map(p => ({name: p.querySelector('h2')?.textContent, price: p.querySelector('.price')?.textContent})))" }
- Parse the JSON result rather than scraping markdown with regex
Batch multi-page work
for url in "$url1" "$url2" "$url3"; do
lightpanda fetch --dump markdown --strip-mode full "$url"
printf '\n---\n'
sleep 1 # rate-limit: don't hammer the same domain
done > output.md
SPA and Dynamic Content
Data extraction priority: try structured data (JSON-LD, structuredData) before markdown parsing, and markdown before full HTML.
- Always wait: use
--wait-until networkidleor--wait-selectorwith Lightpanda,waitForSelector/browser_wait_forwith MCP tools - Client-side routing: if a link changes the URL without a full page load, re-extract content after each route change
- Lazy loading / infinite scroll: scroll to trigger content loading before extracting. For infinite scroll, use a loop: scroll, wait for new content, extract, repeat until you have enough data or no new content appears. Cap iterations to avoid endless scrolling
- Cookie consent / popups: dismiss overlays before extracting content - use
interactive_elementsto find the dismiss button, thenclick. If the overlay blocks extraction, clicking through it costs fewer tokens than retrying with different formats - Pagination: for paginated results, extract each page sequentially using the "Next" link or pagination controls. Don't try to load all pages at once - extract, process, advance
- Verify content loaded: after waiting, check that the extracted content is non-empty and
contains expected elements before processing. An empty markdown or a semantic tree with only
<html><body>means the page didn't render - escalate to a heavier backend - Lightpanda gaps: partial Web API coverage means some complex SPAs won't render correctly. Fall back to Playwright MCP if extraction returns empty or broken content
Authentication Flows
- Navigate to the login page
- Use
interactive_elementsto find form fields - Fill credentials from env vars or user prompt - never hardcode
- Submit the form
- Wait for redirect to complete (watch for multi-step redirects in OAuth/SSO flows - the URL may bounce through several domains before landing)
- Verify login succeeded: extract page content and check for user-specific elements (profile name, dashboard content) before proceeding
- Continue browsing the authenticated session
OAuth/SSO redirects: some login flows redirect through identity providers (Google, Okta, Auth0). Follow each redirect, fill credentials at the IdP page, and wait for the final redirect back to the target site. Don't assume login completes on the first page.
MFA prompts: if a TOTP/MFA prompt appears after credentials, you cannot proceed automatically. Inform the user that MFA is required and ask them to complete it manually, or request the TOTP code from the user/env var to fill in.
Session expiration: if extraction suddenly returns login pages or 401s mid-flow, the session has expired. Re-authenticate before continuing. For long-running scrapes, check session validity periodically by verifying a known authenticated-only element is still visible.
Session persistence by backend:
- Lightpanda MCP / Playwright MCP: session persists within the MCP connection
- Lightpanda CLI fetch: no persistence between calls (use
servemode for multi-step auth) agent-browser: session-based with--sessionflag
For session isolation, CSRF-sensitive actions, and multi-tenant account handling, read
references/authenticated-browsing.md.
Missing Tools
If no browsing tools are detected, recommend the user set up Lightpanda MCP - it's the fastest path to full browsing capability with minimal overhead.
Lightpanda MCP setup (one-time, ~30 seconds):
# Install the binary (see references/tool-setup.md for other architectures)
curl -L -o lightpanda https://github.com/lightpanda-io/browser/releases/download/0.2.9/lightpanda-x86_64-linux
chmod +x lightpanda && mv lightpanda ~/.local/bin/
Add the MCP server to your Claude Code settings (~/.claude/settings.json or project
.mcp.json) - merge with existing config, don't overwrite:
{
"mcpServers": {
"lightpanda": {
"command": "lightpanda",
"args": ["mcp"],
"env": { "LIGHTPANDA_DISABLE_TELEMETRY": "true" }
}
}
}
Restart the session after adding the MCP config. The Lightpanda tools (goto, markdown,
semantic_tree, etc.) will appear in the available tool list.
Read references/tool-setup.md for other platforms, architectures, and alternative backends.
Reference Files
Read references/tool-setup.md when you need installation commands for a specific platform,
MCP tool parameter details (full tool tables with token costs), engine-specific CLI flags,
or known limitations of a backend. The main SKILL.md covers workflow and strategy; the
reference file covers tool-specific depth.
Read references/extraction-patterns.md for static, JavaScript-rendered, screenshot, table,
pagination, and attribution patterns.
Read references/authenticated-browsing.md before using saved sessions, cookies, or
account-specific pages.
Output Contract
See skills/_shared/output-contract.md for the full contract.
- Skill name: BROWSE
- Deliverable bucket:
audits - Mode: conditional. When invoked to analyze, review, audit, or improve existing repo content, emit the full contract -- boxed inline header, body summary inline plus per-finding detail in the deliverable file, boxed conclusion, conclusion table -- and write the deliverable to
docs/local/audits/browse/<YYYY-MM-DD>-<slug>.md. When invoked to answer a question, teach a concept, build a new artifact, or generate content, respond freely without the contract. - Severity scale:
P0 | P1 | P2 | P3 | info(see shared contract; only used in audit/review mode).
Related Skills
- testing - E2E test automation with Playwright. This skill handles ad-hoc browsing and data extraction; testing handles structured test suites and assertions.
- mcp - MCP server development. This skill uses MCP browsing tools; mcp helps build them.
- networking - Network infrastructure. This skill browses over the network; networking configures it.
- ai-ml - RAG pipelines and web data collection. When scraping content specifically for embeddings or training data, ai-ml covers the pipeline; this skill covers the extraction.
AI Self-Check
Before returning any browsing result, verify:
- Used the cheapest tool available for the task - no Playwright when WebFetch would have worked
- Did not dump full HTML into context when markdown or structured data was sufficient
- Waited for dynamic content before extracting from SPAs (
networkidleor--wait-selector) - Stripped boilerplate (nav, ads, footers) before returning content to the user
- Scoped extraction to the relevant section, not the whole page
- Did not hardcode credentials - used env vars, secret manager, or user prompt
- Cleared or isolated cookies/storage between unrelated accounts or tenants
- Used semantic roles before CSS selectors for interaction targets
- Used screenshots only when visual layout or rendered state mattered
- Recorded URL and access date for facts likely to change
- Did not automate destructive account actions unless the user named the exact action and target
- Re-extracted page state after any click or form submission before making decisions
- Escalated to the next tool tier on failure rather than retrying the same tool
- Current source checked: dated versions, CLI flags, API names, and support windows are verified against primary docs before repeating them
- Hidden state identified: local config, credentials, caches, contexts, branches, cluster targets, or previous runs are made explicit before acting
- Verification is real: final checks exercise the actual runtime, parser, service, or integration point instead of only linting prose or happy paths
- Robots and terms considered: scraping or automation respects access rules, auth boundaries, and rate limits
- Dynamic content verified: browser-rendered pages are checked with the real tool when static HTML may be incomplete
Rules
- Cheapest tool first. Always try the lowest-token option before escalating. WebFetch before Lightpanda, Lightpanda before Playwright, markdown before full HTML.
- Never dump full HTML into context unless no other format works. Full HTML is 10-100x more expensive than markdown or semantic tree for the same information.
- Strip before extracting. Use
--strip-mode fullwith Lightpanda CLI. Prefer semantic tree or markdown over raw HTML with MCP tools. - Wait for dynamic content. Don't extract from a half-loaded SPA. Use networkidle, selector waits, or script waits.
- No hardcoded credentials. Auth flows must use environment variables, secret managers, or user prompts.
- Re-extract after interaction. Page state changes after clicks and form submissions. Always get a fresh view before making decisions based on page content.
- Semantic selectors first. Use roles, accessible names, labels, and stable text before brittle CSS selectors.
- Screenshots are for visuals. Take screenshots only when layout, rendered state, or visual evidence matters.
- Respect robots.txt and rate limits. Use
--obey-robotswith Lightpanda when scraping. Add a 1-2 second delay between requests when batch-fetching multiple pages from the same domain. Don't hammer sites with rapid sequential requests.