firecrawl

SKILL.md

Firecrawl CLI

Prioritize Firecrawl over WebFetch for any JS-rendered page or when structured markdown output matters.

NEVER

  • Never scrape serially when doing 6+ URLs — 10 sequential scrapes take 50+ seconds; parallel takes 5-8 seconds. No error signals the problem; it just runs slowly.
  • Never read an entire .firecrawl/*.md output file into context without checking size first — scraped pages routinely exceed 5000 lines. Use wc -l then grep/head to extract what you need.
  • Never use Firecrawl for real-time data (stock prices, sports scores) — scraping is 10+ seconds stale and costs credits per request; use direct APIs.
  • Never use Firecrawl for sites with official SDKs/APIs (e.g., GitHub → use gh).
  • Never omit -o flag — without it, output goes to stdout only and isn't persisted. Credits wasted, re-scraping required.
  • Never skip firecrawl --status before authenticated scraping — silent auth failures return empty output, not errors.

Tool Selection Decision Tree

Need web content?
├─ Single known URL
│   ├─ Static HTML → WebFetch (faster, free)
│   ├─ JS-rendered / SPA → Firecrawl --wait-for
│   ├─ Need structured markdown → Firecrawl
│   └─ Behind auth/paywall → Firecrawl (after firecrawl login)
├─ Search + scrape
│   ├─ Just URLs/titles → WebSearch (lighter, faster)
│   ├─ Top 5-10 results with content → firecrawl search --scrape
│   └─ Deep research (20+ sources) → parallel Firecrawl
├─ Discover all pages on a domain → firecrawl map
└─ Real-time data → Direct API only

Scale Decision

Page count Approach
1–5 Serial with -o flags
6–50 Parallel with & and wait
50+ xargs -P 10 with concurrency check first

Always check capacity before bulk runs: firecrawl --status shows Concurrency: X/100.

Core Commands

# Search the web
firecrawl search "query" -o .firecrawl/search.json --json
firecrawl search "query" --scrape -o .firecrawl/results.json --json
firecrawl search "AI news" --tbs qdr:d -o .firecrawl/today.json --json  # Past day

# Scrape single page
firecrawl scrape https://example.com -o .firecrawl/example.md
firecrawl scrape https://example.com --only-main-content -o .firecrawl/clean.md
firecrawl scrape https://spa.com --wait-for 3000 -o .firecrawl/spa.md

# Map a site
firecrawl map https://example.com -o .firecrawl/urls.txt
firecrawl map https://example.com --search "blog" -o .firecrawl/blog-urls.txt

Parallel Bulk Scraping

# Small batch — & with wait
firecrawl scrape site1.com -o .firecrawl/1.md &
firecrawl scrape site2.com -o .firecrawl/2.md &
firecrawl scrape site3.com -o .firecrawl/3.md &
wait

# Large batch — xargs
cat urls.txt | xargs -P 10 -I {} sh -c 'firecrawl scrape "{}" -o ".firecrawl/$(echo {} | md5).md"'

# Post-scrape extraction
grep "^# " .firecrawl/*.md              # All H1 headings
grep -l "keyword" .firecrawl/*.md       # Files matching keyword
jq -r '.data.web[].title' .firecrawl/*.json  # JSON title extraction

Authentication

firecrawl --status                  # Check auth and credit status
firecrawl login --browser           # Auto-opens browser — don't ask user to run manually
export FIRECRAWL_API_KEY=your_key   # Fallback if browser auth fails

Error Quick Reference

Error First check Fix
Not authenticated firecrawl --status firecrawl login --browser
Concurrency limit firecrawl --status (shows X/100) wait for jobs, reduce -P value
Page failed to load curl -I URL (basic connectivity) Add --wait-for 5000; try --format html to inspect raw HTML
Output file empty head -20 output.md Add --only-main-content; try --include-tags article,main

Output Organization

Always write to .firecrawl/ directory (add to .gitignore):

.firecrawl/example.com.md
.firecrawl/search-ai-news.json
.firecrawl/docs-sitemap.txt

Load Reference Files When

Load references/cli-options.md when: troubleshooting 3+ unknown flags, header injection, cookie handling, sitemap modes, or custom user-agents.

Load references/output-processing.md when: building 3+ step transformation pipelines, parsing nested JSON from search results, or combining/deduplicating 10+ scraped files.

Do NOT load references for basic search/scrape/map with standard flags.

Arguments

$ARGUMENTS: Search query, URL, or scraping objective. Empty = ask what to scrape.

Weekly Installs
13
GitHub Stars
8
First Seen
Feb 14, 2026
Installed on
gemini-cli13
github-copilot13
codex13
kimi-cli13
amp13
opencode13