scrape
scrape
Extract data from any website using Scrapling's MCP tools. Bypasses Cloudflare, handles dynamic JS-rendered pages, supports CSS selectors to pre-filter content (saves tokens).
Usage
/scrape <url>
/scrape <url> with selector .article-content
/scrape stealth <url>
/scrape bulk <url1> <url2> <url3>
When to Use (Tool Selection)
| Scenario | Tool | Why |
|---|---|---|
| Simple page, no anti-bot | scrapling_get |
Fastest. HTTP with browser TLS fingerprint |
| JS-rendered / SPA content | scrapling_fetch |
Uses real Chromium browser |
| Cloudflare / anti-bot protected | scrapling_stealthy_fetch |
Stealth mode, solves captchas |
| Multiple pages, same pattern | scrapling_bulk_get / scrapling_bulk_fetch |
Parallel processing |
Implementation
Step 1: Parse User Intent
Determine from the user's request:
- URL(s) to scrape
- CSS selector (if specified — e.g.,
.main-content,#article,table.data) - Stealth needed? (Cloudflare sites, login-protected, anti-bot mentions)
- Bulk? (multiple URLs)
Step 2: Choose the Right Scrapling MCP Tool
Default path (try in order, escalate on failure):
- Start with
scrapling_get— fast HTTP, handles most sites - If content is empty/blocked → escalate to
scrapling_fetch(real browser) - If still blocked → escalate to
scrapling_stealthy_fetch(stealth mode)
User explicitly asks for stealth: Go straight to scrapling_stealthy_fetch
Multiple URLs: Use the bulk_ variants for parallel processing
Step 3: Call the MCP Tool
Use the Scrapling MCP server tools. All tools accept:
url(required): The URL to scrapecss_selector(optional): CSS selector to extract specific elements — always use this when possible to reduce token consumption
Single page:
Call scrapling MCP tool: get
Arguments: { "url": "<url>", "css_selector": "<selector if provided>" }
Stealth:
Call scrapling MCP tool: stealthy_fetch
Arguments: { "url": "<url>", "css_selector": "<selector if provided>" }
Bulk:
Call scrapling MCP tool: bulk_get
Arguments: { "urls": ["<url1>", "<url2>"], "css_selector": "<selector>" }
Step 4: Process Results
The MCP returns extracted content (HTML or text depending on selector).
If user wants raw data: Present it formatted
If user wants summary: Summarize the extracted content
If user wants vault storage: Save to 00-Inbox/Scrape - [Title].md:
# [Page Title]
**Source:** [URL]
**Scraped:** YYYY-MM-DD
**Selector:** [CSS selector used, if any]
## Content
[Extracted content]
Step 5: Handle Failures
| Error | Action |
|---|---|
| Empty content | Escalate to next fetcher tier |
| Connection refused | Check URL is valid |
| Cloudflare challenge | Auto-escalate to stealthy_fetch |
| Timeout | Retry with longer timeout, suggest fetch for slow JS sites |
Smart Defaults
- News/blog articles: Auto-suggest
article, .post-content, .entry-contentselectors - Product pages: Auto-suggest
.product, .price, .descriptionselectors - Tables: Auto-suggest
tableselector, offer to convert to markdown table - Lists: Auto-suggest
ul, olselectors
Examples
User: /scrape https://example.com/blog/ai-trends
→ Use scrapling_get, auto-detect article content
User: /scrape stealth https://protected-site.com/data
→ Use scrapling_stealthy_fetch with Cloudflare bypass
User: scrape this page and grab just the pricing table: https://saas.com/pricing
→ Use scrapling_get with css_selector="table" or ".pricing"
User: scrape these 5 competitor pages and compare their features
→ Use scrapling_bulk_get, extract feature lists, present comparison
Prerequisites
Scrapling must be installed with MCP support:
pip install "scrapling[ai]"
scrapling install
The scrapling MCP server must be configured in .mcp.json.
Relationship to Other Tools
- WebFetch (native): Basic URL fetch, no anti-bot, no selectors. Use for simple known-good pages.
- Firecrawl MCP: Cloud-based, requires API key, good for crawling entire sites. Use when you need recursive crawl.
- Scrapling: Local, free, stealthy, selector-based. Default choice for single-page or small-batch scraping.
- Apify: Marketplace of specialized scrapers. Use for platform-specific extraction (LinkedIn, Twitter, etc.)
More from davekilleen/dex
resume-builder
Build resume and LinkedIn profile through guided interview
2daily-plan
Generate context-aware daily plan with calendar, tasks, and priorities. Includes midweek awareness, meeting intelligence, commitment tracking, and smart scheduling suggestions.
2anthropic-frontend-design
Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
2quarter-review
Review quarter completion and capture learnings
1quarter-plan
Set 3-5 strategic goals for the quarter
1meeting-prep
Prepare for meetings by gathering attendee context and related topics
1