firecrawl

SKILL.md

Firecrawl & Jina Web Scraping

Firecrawl vs WebFetch

Prefer firecrawl scrape URL --only-main-content over the WebFetch tool—it produces cleaner markdown, handles JavaScript-heavy pages, and avoids content truncation (>80% benchmark coverage). WebFetch is acceptable as a fallback when Firecrawl is unavailable.

# Preferred approach:
firecrawl scrape https://docs.example.com/api --only-main-content

Token-Efficient Scraping

Inspired by Anthropic's dynamic filtering—always filter before reasoning. This reduced input tokens by ~24% and improved accuracy by ~11% in their benchmarks.

The Principle: Search → Filter → Scrape → Filter → Reason

DO:

Search (titles/URLs only) → Evaluate relevance → Scrape top hits → Filter by section → Reason

DON'T:

Search → Scrape everything → Reason over all of it

Step-by-Step Efficient Workflow

# Step 1: Search — get titles/URLs only (cheap)
firecrawl search "query" --limit 20

# Step 2: Evaluate results, pick 3-5 best URLs

# Step 3: Scrape only those, filter to relevant sections
firecrawl scrape URL1 --only-main-content | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py \
  --sections "API,Authentication" --max-chars 5000

Post-Processing with filter_web_results.py

Pipe any Firecrawl or Exa output through this script to reduce context before reasoning:

# Extract only matching sections from scraped page
firecrawl scrape URL --only-main-content | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "Pricing,Plans"

# Keep only paragraphs with keywords
firecrawl search "query" --scrape --pretty | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --keywords "pricing,cost" --max-chars 5000

# Extract specific JSON fields from API output
python3 ~/.claude/skills/exa-search/scripts/exa_search.py "query" --json | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --fields "title,url,text" --max-chars 3000

# Combine filters with stats
firecrawl scrape URL --only-main-content | \
  python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "API" --keywords "endpoint" --compact --stats

Full path: python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py Flags: --sections, --keywords, --max-chars, --max-lines, --fields (JSON), --strip-links, --strip-images, --compact, --stats

Other Token-Saving Patterns

  • Use --only-main-content to strip navigation and footer boilerplate, reducing token consumption. Omit only when nav/footer content is specifically needed.
  • Use firecrawl map URL --search "topic" first to find relevant subpages before scraping
  • Use --format links first to get URL list, evaluate, then scrape selectively
  • Use --max-chars with exa_contents.py to cap extraction length
  • Use --formats summary (Python API script) over full text when you need the gist, not raw content

Claude API Native Tools (for API Agent Builders)

Anthropic's API now offers built-in dynamic filtering tools:

web_search_20260209 / web_fetch_20260209
Header: anthropic-beta: code-execution-web-tools-2026-02-09

These have built-in dynamic filtering via code execution. Use them when building Claude API agents directly. Use Firecrawl/Exa when you need: autonomous agents, batch scraping, structured extraction, domain-specific crawling, or when not on the Claude API.


Available Tools

1. Official Firecrawl CLI (firecrawl) — Primary

Setup: npm install -g firecrawl-cli && firecrawl login --api-key $FIRECRAWL_API_KEY

Command Purpose Quick Example
scrape Single page → markdown firecrawl scrape URL --only-main-content
crawl Entire site with progress firecrawl crawl URL --wait --progress --limit 50
map Discover all URLs on a site firecrawl map URL --search "API"
search Web search (+ optional scrape) firecrawl search "query" --limit 10

Full CLI reference: references/cli-reference.md

2. Auto-Save Alias (fc-save) — Shell Alias

Requires shell alias setup (not bundled with this skill).

fc-save URL
# → Saves to ~/Desktop/Screencaps & Chats/Web-Scrapes/docs-example-com-api.md

3. Python API Script (firecrawl_api.py) — Advanced Features

Command: python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py <command> Requires: FIRECRAWL_API_KEY env var, pip install firecrawl-py requests

Command Purpose Quick Example
search Web search with scraping firecrawl_api.py search "query" -n 10
scrape Single URL with page actions firecrawl_api.py scrape URL --formats markdown summary
batch-scrape Multiple URLs concurrently firecrawl_api.py batch-scrape URL1 URL2 URL3
crawl Website crawling firecrawl_api.py crawl URL --limit 20
map URL discovery firecrawl_api.py map URL --search "query"
extract LLM-powered structured extraction firecrawl_api.py extract URL --prompt "Find pricing"
agent Autonomous extraction (no URLs needed) firecrawl_api.py agent "Find YC W24 AI startups"
parallel-agent Bulk agent queries (v2.8.0+) firecrawl_api.py parallel-agent "Q1" "Q2" "Q3"

Agent models: spark-1-fast (10 credits, simple), spark-1-mini (default), spark-1-pro (thorough)

Full Python API reference: references/python-api-reference.md

4. DeepWiki — GitHub Repo Documentation

~/.claude/skills/firecrawl/scripts/deepwiki.sh <owner/repo> [section] [options]

AI-generated wiki for any public GitHub repo. No API key required.

# Overview
~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat

# Browse sections
~/.claude/skills/firecrawl/scripts/deepwiki.sh langchain-ai/langchain --toc

# Specific section
~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat 4.1-gpt-transformer-implementation

# Full dump for RAG
~/.claude/skills/firecrawl/scripts/deepwiki.sh openai/openai-python --all --save

5. Jina Reader (jina) — Fallback

Use when Firecrawl fails or for Twitter/X URLs (Firecrawl blocks Twitter, Jina works).

jina https://x.com/username/status/123456

Firecrawl vs Exa vs Native Claude Tools

Need Best Tool Why
Single page → markdown firecrawl scrape --only-main-content Cleanest output
Search + scrape in one shot firecrawl search --scrape Combined operation
Crawl entire site firecrawl crawl --wait --progress Link following + progress
Autonomous data finding firecrawl_api.py agent No URLs needed
Semantic/neural search Exa exa_search.py AI-powered relevance
Find research papers Exa --category "research paper" Academic index
Quick research answer Exa exa_research.py Citations + synthesis
Find similar pages Exa exa_similar.py Competitive analysis
Claude API agent building Native web_search_20260209 Built-in dynamic filtering
Twitter/X content jina URL Only tool that works
GitHub repo docs deepwiki.sh owner/repo AI-generated wiki
Anti-bot / Cloudflare bypass scrapling stealth fetch Local Turnstile solver
Element-level extraction scrapling + CSS selectors Precision targeting, adaptive tracking
No API key scraping scrapling HTTP fetch 100% local, no credentials
Site redesign resilience scrapling adaptive mode SQLite similarity matching

Common Workflows

Single Page Scraping

firecrawl scrape https://example.com/page --only-main-content
# Or auto-save: fc-save URL
# Or to file: firecrawl scrape URL --only-main-content -o page.md

Documentation Crawling

# Map first, then crawl relevant paths
firecrawl map https://docs.example.com --search "API"
firecrawl crawl https://docs.example.com --include-paths /api,/guides --wait --progress

Research Workflow

firecrawl search "machine learning best practices 2026" --scrape --scrape-formats markdown

Agent-Powered Research (No URLs Needed)

python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py agent \
  "Compare pricing tiers for Firecrawl, Apify, and ScrapingBee"

Troubleshooting

# Check status and credits
firecrawl --status && firecrawl credit-usage

# Re-authenticate
firecrawl logout && firecrawl login --api-key $FIRECRAWL_API_KEY

# Check API key
echo $FIRECRAWL_API_KEY
  • Scrape fails: Try jina URL, or add --wait-for 3000 for JS-heavy sites
  • Async job stuck: Check with crawl-status/batch-status, cancel with crawl-cancel/batch-cancel
  • Disable telemetry: export FIRECRAWL_NO_TELEMETRY=1

Reference Documentation

File Contents
references/cli-reference.md Full CLI parameter reference (scrape, crawl, map, search, fc-save, jina, deepwiki)
references/python-api-reference.md Full Python API script reference (all commands, SDK examples)
references/firecrawl-api.md Firecrawl Search API reference
references/firecrawl-agent-api.md Agent API (spark models, parallel agents, webhooks)
references/actions-reference.md Page actions for dynamic content (click, write, wait, scroll)
references/branding-format.md Brand identity extraction (colors, fonts, UI)

Test Suite

python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --quick    # Quick validation
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py            # Full suite
python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --test scrape  # Specific test
Weekly Installs
29
GitHub Stars
12
First Seen
Feb 21, 2026
Installed on
opencode29
github-copilot29
codex29
kimi-cli29
amp29
gemini-cli29