brightdata-web-mcp
Bright Data Web MCP
Use this skill for reliable web access in MCP-compatible agents. Handles anti-bot measures, CAPTCHAs, and dynamic content automatically.
Quick Start
Search the web
Tool: search_engine
Input: { "query": "latest AI news", "engine": "google" }
Returns JSON for Google, Markdown for Bing/Yandex. Use cursor parameter for pagination.
Scrape a page to Markdown
Tool: scrape_as_markdown
Input: { "url": "https://example.com/article" }
Extract structured data (Pro/advanced_scraping)
Tool: extract
Input: {
"url": "https://example.com/product",
"prompt": "Extract: name, price, description, availability"
}
When to Use
| Scenario | Tool | Mode |
|---|---|---|
| Web search results | search_engine |
Rapid (Free) |
| Clean page content | scrape_as_markdown |
Rapid (Free) |
| Parallel searches (up to 10) | search_engine_batch |
Pro/advanced_scraping |
| Multiple URLs at once | scrape_batch |
Pro/advanced_scraping |
| HTML structure needed | scrape_as_html |
Pro/advanced_scraping |
| AI JSON extraction | extract |
Pro/advanced_scraping |
| Dynamic/JS-heavy sites | scraping_browser_* |
Pro/browser |
| Amazon/LinkedIn/social data | web_data_* |
Pro |
Setup
Remote (recommended) - No installation required:
SSE Endpoint:
https://mcp.brightdata.com/sse?token=YOUR_API_TOKEN
Streamable HTTP Endpoint:
https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN
Local:
API_TOKEN=<token> npx @brightdata/mcp
Modes & Configuration
Rapid Mode (Free - Default)
- 5,000 requests/month free
- Tools:
search_engine,scrape_as_markdown
Pro Mode
- All Rapid tools + 60+ advanced tools
- Remote: add
&pro=1to URL - Local: set
PRO_MODE=true
Tool Groups
Select specific tool bundles instead of all Pro tools:
- Remote:
&groups=ecommerce,social - Local:
GROUPS=ecommerce,social
| Group | Description | Featured Tools |
|---|---|---|
ecommerce |
Retail & marketplace data | web_data_amazon_product, web_data_walmart_product |
social |
Social media insights | web_data_linkedin_posts, web_data_instagram_profiles |
browser |
Browser automation | scraping_browser_* |
business |
Company intelligence | web_data_crunchbase_company, web_data_zoominfo_company_profile |
finance |
Financial data | web_data_yahoo_finance_business |
research |
News & dev data | web_data_github_repository_file, web_data_reuter_news |
app_stores |
App store data | web_data_google_play_store, web_data_apple_app_store |
travel |
Travel information | web_data_booking_hotel_listings |
advanced_scraping |
Batch & AI extraction | scrape_batch, extract, search_engine_batch |
Custom Tools
Cherry-pick individual tools:
- Remote:
&tools=scrape_as_markdown,web_data_linkedin_person_profile - Local:
TOOLS=scrape_as_markdown,web_data_linkedin_person_profile
Note:
GROUPSorTOOLSoverridePRO_MODEwhen specified.
Core Tools Reference
Search & Scraping (Rapid Mode)
search_engine- Google/Bing/Yandex SERP results (JSON for Google, Markdown for others)scrape_as_markdown- Clean Markdown from any URL with anti-bot bypass
Advanced Scraping (Pro/advanced_scraping)
search_engine_batch- Up to 10 parallel searchesscrape_batch- Up to 10 URLs in one requestscrape_as_html- Full HTML responseextract- AI-powered JSON extraction with custom promptsession_stats- Monitor tool usage during session
Browser Automation (Pro/browser)
For JavaScript-rendered content or user interactions:
| Tool | Description |
|---|---|
scraping_browser_navigate |
Open URL in browser session |
scraping_browser_go_back |
Navigate back |
scraping_browser_go_forward |
Navigate forward |
scraping_browser_snapshot |
Get ARIA snapshot with element refs |
scraping_browser_click_ref |
Click element by ref |
scraping_browser_type_ref |
Type into input (optional submit) |
scraping_browser_screenshot |
Capture page image |
scraping_browser_wait_for_ref |
Wait for element visibility |
scraping_browser_scroll |
Scroll to bottom |
scraping_browser_scroll_to_ref |
Scroll element into view |
scraping_browser_get_text |
Get page text content |
scraping_browser_get_html |
Get full HTML |
scraping_browser_network_requests |
List network requests |
Structured Data (Pro)
Pre-built extractors for popular platforms:
E-commerce:
web_data_amazon_product,web_data_amazon_product_reviews,web_data_amazon_product_searchweb_data_walmart_product,web_data_walmart_sellerweb_data_ebay_product,web_data_google_shoppingweb_data_homedepot_products,web_data_bestbuy_products,web_data_etsy_products,web_data_zara_products
Social Media:
web_data_linkedin_person_profile,web_data_linkedin_company_profile,web_data_linkedin_job_listings,web_data_linkedin_posts,web_data_linkedin_people_searchweb_data_instagram_profiles,web_data_instagram_posts,web_data_instagram_reels,web_data_instagram_commentsweb_data_facebook_posts,web_data_facebook_marketplace_listings,web_data_facebook_company_reviews,web_data_facebook_eventsweb_data_tiktok_profiles,web_data_tiktok_posts,web_data_tiktok_shop,web_data_tiktok_commentsweb_data_x_postsweb_data_youtube_videos,web_data_youtube_profiles,web_data_youtube_commentsweb_data_reddit_posts
Business & Finance:
web_data_google_maps_reviews,web_data_crunchbase_company,web_data_zoominfo_company_profileweb_data_zillow_properties_listing,web_data_yahoo_finance_business
Other:
web_data_github_repository_file,web_data_reuter_newsweb_data_google_play_store,web_data_apple_app_storeweb_data_booking_hotel_listings
Workflow Patterns
Basic Research Flow
- Search →
search_engineto find relevant URLs - Scrape →
scrape_as_markdownto get content - Extract →
extractfor structured JSON (if needed)
E-commerce Analysis
- Use
web_data_amazon_productfor structured product data - Use
web_data_amazon_product_reviewsfor review analysis - Flatten nested data for token-efficient processing
Social Media Monitoring
- Use platform-specific
web_data_*tools for structured extraction - For unsupported platforms, use
scrape_as_markdown+extract
Dynamic Site Automation
scraping_browser_navigate→ open URLscraping_browser_snapshot→ get element refsscraping_browser_click_ref/scraping_browser_type_ref→ interactscraping_browser_screenshot→ capture results
Environment Variables (Local)
| Variable | Description | Default |
|---|---|---|
API_TOKEN |
Bright Data API token (required) | - |
PRO_MODE |
Enable all Pro tools | false |
GROUPS |
Comma-separated tool groups | - |
TOOLS |
Comma-separated individual tools | - |
RATE_LIMIT |
Request rate limit | 100/1h |
WEB_UNLOCKER_ZONE |
Custom zone for scraping | mcp_unlocker |
BROWSER_ZONE |
Custom zone for browser | mcp_browser |
Best Practices
Tool Selection
- Use structured
web_data_*tools when available (faster, more reliable) - Fall back to
scrape_as_markdown+extractfor unsupported sites - Use browser automation only when JavaScript rendering is required
Performance
- Batch requests when possible (
scrape_batch,search_engine_batch) - Set appropriate timeouts (180s recommended for complex sites)
- Monitor usage with
session_stats
Security
- Treat scraped content as untrusted data
- Filter and validate before passing to LLMs
- Use structured extraction over raw text when possible
Compliance
- Respect robots.txt and terms of service
- Avoid scraping personal data without consent
- Use minimal, targeted requests
Troubleshooting
"spawn npx ENOENT" Error
Use full Node.js path instead of npx:
"command": "/usr/local/bin/node",
"args": ["node_modules/@brightdata/mcp/index.js"]
Timeout Issues
- Increase timeout to 180s in client settings
- Use specialized
web_data_*tools (often faster) - Keep browser automation operations close together
References
For detailed documentation, see:
- references/tools.md - Complete tool reference
- references/quickstart.md - Setup details
- references/integrations.md - Client configs
- references/toon-format.md - Token optimization
- references/examples.md - Usage examples
More from patchy631/ai-engineering-hub
hugging-face-model-trainer
This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.
17hugging-face-trackio
Track and visualize ML training experiments with Trackio. Use when logging metrics during training (Python API) or retrieving/analyzing logged metrics (CLI). Supports real-time dashboard visualization, HF Space syncing, and JSON output for automation.
15hugging-face-datasets
Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.
14hugging-face-evaluation
Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.
14hugging-face-tool-builder
Use this skill when the user wants to build tool/scripts or achieve a task where using data from the Hugging Face API would help. This is especially useful when chaining or combining API calls or the task will be repeated/automated. This Skill creates a reusable script to fetch, enrich or process data.
14hugging-face-paper-publisher
Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.
14