firecrawl
Firecrawl & Jina Web Scraping
Firecrawl vs WebFetch
Prefer firecrawl scrape URL --only-main-content over the WebFetch tool—it produces cleaner markdown, handles JavaScript-heavy pages, and avoids content truncation (>80% benchmark coverage). WebFetch is acceptable as a fallback when Firecrawl is unavailable.
# Preferred approach:
firecrawl scrape https://docs.example.com/api --only-main-content
Token-Efficient Scraping
Inspired by Anthropic's dynamic filtering—always filter before reasoning. This reduced input tokens by ~24% and improved accuracy by ~11% in their benchmarks.
The Principle: Search → Filter → Scrape → Filter → Reason
DO:
Search (titles/URLs only) → Evaluate relevance → Scrape top hits → Filter by section → Reason
DON'T:
Search → Scrape everything → Reason over all of it
Step-by-Step Efficient Workflow
# Step 1: Search — get titles/URLs only (cheap)
firecrawl search "query" --limit 20
# Step 2: Evaluate results, pick 3-5 best URLs
# Step 3: Scrape only those, filter to relevant sections
firecrawl scrape URL1 --only-main-content | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py \
--sections "API,Authentication" --max-chars 5000
Post-Processing with filter_web_results.py
Pipe any Firecrawl or Exa output through this script to reduce context before reasoning:
# Extract only matching sections from scraped page
firecrawl scrape URL --only-main-content | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "Pricing,Plans"
# Keep only paragraphs with keywords
firecrawl search "query" --scrape --pretty | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --keywords "pricing,cost" --max-chars 5000
# Extract specific JSON fields from API output
python3 ~/.claude/skills/exa-search/scripts/exa_search.py "query" --json | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --fields "title,url,text" --max-chars 3000
# Combine filters with stats
firecrawl scrape URL --only-main-content | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "API" --keywords "endpoint" --compact --stats
Full path: python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py
Flags: --sections, --keywords, --max-chars, --max-lines, --fields (JSON), --strip-links, --strip-images, --compact, --stats
Other Token-Saving Patterns
- Use
--only-main-contentto strip navigation and footer boilerplate, reducing token consumption. Omit only when nav/footer content is specifically needed. - Use
firecrawl map URL --search "topic"first to find relevant subpages before scraping - Use
--format linksfirst to get URL list, evaluate, then scrape selectively - Use
--max-charswithexa_contents.pyto cap extraction length - Use
--formats summary(Python API script) over full text when you need the gist, not raw content
Claude API Native Tools (for API Agent Builders)
Anthropic's API now offers built-in dynamic filtering tools:
web_search_20260209 / web_fetch_20260209
Header: anthropic-beta: code-execution-web-tools-2026-02-09
These have built-in dynamic filtering via code execution. Use them when building Claude API agents directly. Use Firecrawl/Exa when you need: autonomous agents, batch scraping, structured extraction, domain-specific crawling, or when not on the Claude API.
Available Tools
1. Official Firecrawl CLI (firecrawl) — Primary
Setup: npm install -g firecrawl-cli && firecrawl login --api-key $FIRECRAWL_API_KEY
| Command | Purpose | Quick Example |
|---|---|---|
scrape |
Single page → markdown | firecrawl scrape URL --only-main-content |
crawl |
Entire site with progress | firecrawl crawl URL --wait --progress --limit 50 |
map |
Discover all URLs on a site | firecrawl map URL --search "API" |
search |
Web search (+ optional scrape) | firecrawl search "query" --limit 10 |
Full CLI reference: references/cli-reference.md
2. Auto-Save Alias (fc-save) — Shell Alias
Requires shell alias setup (not bundled with this skill).
fc-save URL
# → Saves to ~/Desktop/Screencaps & Chats/Web-Scrapes/docs-example-com-api.md
3. Python API Script (firecrawl_api.py) — Advanced Features
Command: python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py <command>
Requires: FIRECRAWL_API_KEY env var, pip install firecrawl-py requests
| Command | Purpose | Quick Example |
|---|---|---|
search |
Web search with scraping | firecrawl_api.py search "query" -n 10 |
scrape |
Single URL with page actions | firecrawl_api.py scrape URL --formats markdown summary |
batch-scrape |
Multiple URLs concurrently | firecrawl_api.py batch-scrape URL1 URL2 URL3 |
crawl |
Website crawling | firecrawl_api.py crawl URL --limit 20 |
map |
URL discovery | firecrawl_api.py map URL --search "query" |
extract |
LLM-powered structured extraction | firecrawl_api.py extract URL --prompt "Find pricing" |
agent |
Autonomous extraction (no URLs needed) | firecrawl_api.py agent "Find YC W24 AI startups" |
parallel-agent |
Bulk agent queries (v2.8.0+) | firecrawl_api.py parallel-agent "Q1" "Q2" "Q3" |
Agent models: spark-1-fast (10 credits, simple), spark-1-mini (default), spark-1-pro (thorough)
Full Python API reference: references/python-api-reference.md
4. DeepWiki — GitHub Repo Documentation
~/.claude/skills/firecrawl/scripts/deepwiki.sh <owner/repo> [section] [options]
AI-generated wiki for any public GitHub repo. No API key required.
# Overview
~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat
# Browse sections
~/.claude/skills/firecrawl/scripts/deepwiki.sh langchain-ai/langchain --toc
# Specific section
~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat 4.1-gpt-transformer-implementation
# Full dump for RAG
~/.claude/skills/firecrawl/scripts/deepwiki.sh openai/openai-python --all --save
5. Jina Reader (jina) — Fallback
Use when Firecrawl fails or for Twitter/X URLs (Firecrawl blocks Twitter, Jina works).
jina https://x.com/username/status/123456
Firecrawl vs Exa vs Native Claude Tools
| Need | Best Tool | Why |
|---|---|---|
| Single page → markdown | firecrawl scrape --only-main-content |
Cleanest output |
| Search + scrape in one shot | firecrawl search --scrape |
Combined operation |
| Crawl entire site | firecrawl crawl --wait --progress |
Link following + progress |
| Autonomous data finding | firecrawl_api.py agent |
No URLs needed |
| Semantic/neural search | Exa exa_search.py |
AI-powered relevance |
| Find research papers | Exa --category "research paper" |
Academic index |
| Quick research answer | Exa exa_research.py |
Citations + synthesis |
| Find similar pages | Exa exa_similar.py |
Competitive analysis |
| Claude API agent building | Native web_search_20260209 |
Built-in dynamic filtering |
| Twitter/X content | jina URL |
Only tool that works |
| GitHub repo docs | deepwiki.sh owner/repo |
AI-generated wiki |
| Anti-bot / Cloudflare bypass | scrapling stealth fetch |
Local Turnstile solver |
| Element-level extraction | scrapling + CSS selectors |
Precision targeting, adaptive tracking |
| No API key scraping | scrapling HTTP fetch |
100% local, no credentials |
| Site redesign resilience | scrapling adaptive mode |
SQLite similarity matching |
Common Workflows
Single Page Scraping
firecrawl scrape https://example.com/page --only-main-content
# Or auto-save: fc-save URL
# Or to file: firecrawl scrape URL --only-main-content -o page.md
Documentation Crawling
# Map first, then crawl relevant paths
firecrawl map https://docs.example.com --search "API"
firecrawl crawl https://docs.example.com --include-paths /api,/guides --wait --progress
Research Workflow
firecrawl search "machine learning best practices 2026" --scrape --scrape-formats markdown
Agent-Powered Research (No URLs Needed)
python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py agent \
"Compare pricing tiers for Firecrawl, Apify, and ScrapingBee"
Troubleshooting
# Check status and credits
firecrawl --status && firecrawl credit-usage
# Re-authenticate
firecrawl logout && firecrawl login --api-key $FIRECRAWL_API_KEY
# Check API key
echo $FIRECRAWL_API_KEY
- Scrape fails: Try
jina URL, or add--wait-for 3000for JS-heavy sites - Async job stuck: Check with
crawl-status/batch-status, cancel withcrawl-cancel/batch-cancel - Disable telemetry:
export FIRECRAWL_NO_TELEMETRY=1
Reference Documentation
| File | Contents |
|---|---|
references/cli-reference.md |
Full CLI parameter reference (scrape, crawl, map, search, fc-save, jina, deepwiki) |
references/python-api-reference.md |
Full Python API script reference (all commands, SDK examples) |
references/firecrawl-api.md |
Firecrawl Search API reference |
references/firecrawl-agent-api.md |
Agent API (spark models, parallel agents, webhooks) |
references/actions-reference.md |
Page actions for dynamic content (click, write, wait, scroll) |
references/branding-format.md |
Brand identity extraction (colors, fonts, UI) |
Test Suite
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --quick # Quick validation
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py # Full suite
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --test scrape # Specific test