firecrawl
Firecrawl CLI
Web scraping, search, and browser automation CLI. Returns clean markdown optimized for LLM context windows.
Run firecrawl --help or firecrawl <command> --help for full option details.
Prerequisites
Must be installed and authenticated. Check with firecrawl --status. If not ready, see rules/install.md.
When to Use What
| Need | Command | When |
|---|---|---|
| Find pages on a topic | search |
No specific URL yet |
| Get a page's content | scrape |
Have a URL, page is static or JS-rendered |
| Find URLs within a site | map |
Need to locate a specific subpage |
| Bulk extract a site section | crawl |
Need many pages (e.g., all /docs/) |
| AI-powered data extraction | agent |
Need structured data from complex sites |
| Interact with a page | browser |
Content requires clicks, form fills, pagination, or login |
Key distinction - scrape vs browser:
- Use
scrapefirst. It handles static pages and JS-rendered SPAs (with--wait-for). - Use
browseronly when scrape fails because content is behind interaction: pagination buttons, modals, dropdowns, multi-step navigation, or infinite scroll. - Never use browser for web searches - use
firecrawl searchinstead. - Never use browser on bot-protected sites (Google, Bing, Cloudflare challenges).
Avoid redundant fetches:
search --scrapealready fetches full page content. Don't re-scrape those URLs.- Check if you already have the data in
.firecrawl/before fetching again.
Output & Organization
Always write to .firecrawl/ with -o. Add .firecrawl/ to .gitignore.
firecrawl search "query" -o .firecrawl/search-query.json
firecrawl scrape "<url>" -o .firecrawl/page.md
Never read entire output files at once. Use grep, head, or incremental reads:
wc -l .firecrawl/file.md && head -50 .firecrawl/file.md
grep -n "keyword" .firecrawl/file.md
Single format outputs raw content. Multiple formats output JSON.
Commands
search
firecrawl search "query" -o .firecrawl/result.json --json
firecrawl search "query" --limit 10 -o .firecrawl/result.json --json
firecrawl search "query" --sources news -o .firecrawl/news.json --json
firecrawl search "query" --sources images -o .firecrawl/images.json --json
firecrawl search "query" --tbs qdr:d -o .firecrawl/today.json --json
firecrawl search "query" --scrape -o .firecrawl/scraped.json --json
firecrawl search "query" --categories github -o .firecrawl/github.json --json
Options: --limit <n>, --sources <web,images,news>, --categories <github,research,pdf>, --tbs <qdr:h|d|w|m|y>, --location, --country <code>, --scrape, --scrape-formats, -o
scrape
firecrawl scrape "<url>" -o .firecrawl/page.md
firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md
firecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json
firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md
firecrawl scrape "<url>" --include-tags article,main -o .firecrawl/page.md
firecrawl scrape "<url>" --exclude-tags nav,aside -o .firecrawl/page.md
firecrawl scrape "<url>" --html -o .firecrawl/page.html
Options: -f <markdown,html,rawHtml,links,screenshot,json>, -H, --only-main-content, --wait-for <ms>, --include-tags, --exclude-tags, -o
map
firecrawl map "<url>" -o .firecrawl/urls.txt
firecrawl map "<url>" --search "keyword" -o .firecrawl/filtered.txt
firecrawl map "<url>" --limit 500 --json -o .firecrawl/urls.json
firecrawl map "<url>" --include-subdomains -o .firecrawl/all-urls.txt
Options: --limit <n>, --search <query>, --sitemap <include|skip|only>, --include-subdomains, --json, -o
crawl
firecrawl crawl "<url>" --wait -o .firecrawl/crawl.json
firecrawl crawl "<url>" --limit 100 --max-depth 3 --wait -o .firecrawl/crawl.json
firecrawl crawl "<url>" --include-paths /blog,/docs --wait -o .firecrawl/crawl.json
firecrawl crawl "<url>" --exclude-paths /admin --wait -o .firecrawl/crawl.json
firecrawl crawl "<url>" --delay 1000 --max-concurrency 2 --wait -o .firecrawl/crawl.json
firecrawl crawl <job-id>
Options: --wait, --progress, --limit <n>, --max-depth <n>, --include-paths, --exclude-paths, --delay <ms>, --max-concurrency <n>, --pretty, -o
agent
AI-powered extraction (2-5 minutes):
firecrawl agent "prompt" --wait -o .firecrawl/agent.json
firecrawl agent "prompt" --urls "<url>" --wait -o .firecrawl/agent.json
firecrawl agent "prompt" --schema '{"type":"object","properties":{"name":{"type":"string"}}}' --wait -o .firecrawl/agent.json
firecrawl agent "prompt" --model spark-1-pro --wait -o .firecrawl/agent.json
firecrawl agent "prompt" --max-credits 100 --wait -o .firecrawl/agent.json
firecrawl agent <job-id>
Options: --urls, --model <spark-1-mini|spark-1-pro>, --schema <json>, --schema-file, --max-credits <n>, --wait, --pretty, -o
browser
Cloud Chromium sessions in Firecrawl's remote sandboxed environment. Use when scrape can't get the data due to required interaction.
firecrawl browser "open <url>"
firecrawl browser "snapshot"
firecrawl browser "click @e5"
firecrawl browser "fill @e3 'text'"
firecrawl browser "scroll down"
firecrawl browser "scrape" -o .firecrawl/browser.md
firecrawl browser close
Commands: open <url>, snapshot, screenshot, click <@ref>, type <@ref> <text>, fill <@ref> <text>, scrape, scroll <direction>, wait <seconds>, eval <js>
Session management: firecrawl browser launch-session --ttl 600, firecrawl browser list, firecrawl browser close
Options: --ttl <seconds>, --ttl-inactivity <seconds>, --stream, --session <id>, -o
credit-usage
firecrawl credit-usage
firecrawl credit-usage --json --pretty -o .firecrawl/credits.json
Combining with Other Tools
jq -r '.data.web[].url' .firecrawl/search.json
jq -r '.data.web[] | "\(.title): \(.url)"' .firecrawl/search.json
jq -r '.data.news[] | "[\(.date)] \(.title)"' .firecrawl/news.json
Parallelization
Always run independent operations in parallel. Check firecrawl --status for concurrency limit:
firecrawl scrape "<url-1>" -o .firecrawl/1.md &
firecrawl scrape "<url-2>" -o .firecrawl/2.md &
firecrawl scrape "<url-3>" -o .firecrawl/3.md &
wait