firecrawl
Firecrawl CLI
Use the firecrawl CLI for web scraping, search, and browser automation. It returns clean markdown optimized for LLM context windows and handles JavaScript rendering.
Prerequisites
The firecrawl-cli package must already be installed and authenticated. Check status:
firecrawl --status
Expected output when ready:
š„ firecrawl cli v1.4.1
ā Authenticated via FIRECRAWL_API_KEY
Concurrency: 0/100 jobs (parallel scrape limit)
Credits: 500,000 remaining
- Concurrency: Max parallel jobs. Stay within this limit.
- Credits: Remaining API credits. Each operation consumes credits.
If not installed or not authenticated, refer to rules/install.md.
Organization
Create a .firecrawl/ folder in the working directory to store results. Add .firecrawl/ to .gitignore. Use -o to write output to files (avoids flooding context):
firecrawl search "your query" -o .firecrawl/search-{query}.json
firecrawl scrape https://example.com -o .firecrawl/{site}-{path}.md
Always quote URLs in shell commands.
Commands
Search
# Basic search
firecrawl search "your query" -o .firecrawl/search-query.json --json
# Limit results
firecrawl search "AI news" --limit 10 -o .firecrawl/search-ai-news.json --json
# Search specific sources
firecrawl search "tech startups" --sources news -o .firecrawl/search-news.json --json
firecrawl search "landscapes" --sources images -o .firecrawl/search-images.json --json
# Time-based search
firecrawl search "AI announcements" --tbs qdr:d -o .firecrawl/search-today.json --json # Past day
firecrawl search "tech news" --tbs qdr:w -o .firecrawl/search-week.json --json # Past week
# Search AND scrape content from results
firecrawl search "API docs" --scrape -o .firecrawl/search-docs.json --json
Options: --limit <n>, --sources <web,images,news>, --categories <github,research,pdf>, --tbs <qdr:h|d|w|m|y>, --location, --country <code>, --scrape, -o <path>
Scrape
# Markdown output
firecrawl scrape https://example.com -o .firecrawl/example.md
# Main content only (removes nav, footer, ads)
firecrawl scrape https://example.com --only-main-content -o .firecrawl/example.md
# Multiple formats (JSON output)
firecrawl scrape https://example.com --format markdown,links -o .firecrawl/example.json
# Wait for JS to render
firecrawl scrape https://spa-app.com --wait-for 3000 -o .firecrawl/spa.md
# Include/exclude specific HTML tags
firecrawl scrape https://example.com --include-tags article,main -o .firecrawl/article.md
Options: -f <markdown,html,rawHtml,links,screenshot,json>, --only-main-content, --wait-for <ms>, --include-tags, --exclude-tags, -o <path>
Map
Discover all URLs on a site:
firecrawl map https://example.com -o .firecrawl/urls.txt
firecrawl map https://example.com --search "blog" -o .firecrawl/blog-urls.txt
firecrawl map https://example.com --limit 500 --json -o .firecrawl/urls.json
Options: --limit <n>, --search <query>, --include-subdomains, --json, -o <path>
Browser
Launch cloud Chromium sessions for interactive browsing. All browser commands execute in Firecrawl's remote sandboxed environment (isolated cloud VMs), not on the user's local machine. No local processes are spawned, no local ports are opened, and no local files are accessible to the browser session.
Shorthand (Recommended)
Sessions auto-launch on first use:
firecrawl browser "open https://example.com"
firecrawl browser "snapshot"
firecrawl browser "click @e5"
firecrawl browser "fill @e3 'search query'"
firecrawl browser "scrape" -o .firecrawl/browser-scrape.md
firecrawl browser close
Core commands: open <url>, snapshot, screenshot, click <@ref>, type <@ref> <text>, fill <@ref> <text>, scrape, scroll <direction>, wait <seconds>
Playwright & Script Execution (Remote Sandbox Only)
The --python, --node, and --bash flags execute code exclusively in Firecrawl's remote sandboxed cloud environment. Code runs in an isolated VM with no access to the user's local filesystem, environment variables, or network. The sandbox has Playwright and agent-browser pre-installed.
# Playwright Python (runs in remote sandbox)
firecrawl browser execute --python 'await page.goto("https://example.com")
print(await page.title())'
# Playwright JavaScript (runs in remote sandbox)
firecrawl browser execute --node 'await page.goto("https://example.com"); await page.title()'
# Shell commands (runs in remote sandbox, not locally)
firecrawl browser execute --bash "agent-browser snapshot"
In Python/Node mode, page, browser, and context objects are pre-configured. Use print() to return output from Python execution.
Session Management
firecrawl browser launch-session --ttl 600
firecrawl browser list
firecrawl browser close
Options: --ttl <seconds>, --ttl-inactivity <seconds>, --stream, --session <id>, -o <path>
Crawl
# Start and wait for completion
firecrawl crawl https://example.com --wait -o .firecrawl/crawl-result.json
# Limit scope
firecrawl crawl https://example.com --limit 100 --max-depth 3 --wait -o .firecrawl/crawl-result.json
# Crawl specific sections
firecrawl crawl https://example.com --include-paths /blog,/docs --wait -o .firecrawl/crawl-result.json
Options: --wait, --progress, --limit <n>, --max-depth <n>, --include-paths, --exclude-paths, --delay <ms>, --max-concurrency <n>, -o <path>
Agent
AI-powered autonomous web data extraction (takes 2-5 minutes):
firecrawl agent "Find the pricing plans for Firecrawl" --wait -o .firecrawl/agent-pricing.json
firecrawl agent "Extract product data" --urls https://example.com --wait -o .firecrawl/agent-products.json
firecrawl agent "Extract company info" --schema '{"type":"object","properties":{"name":{"type":"string"}}}' --wait -o .firecrawl/agent-info.json
Options: --urls <urls>, --model <spark-1-mini|spark-1-pro>, --schema <json>, --schema-file <path>, --max-credits <n>, --wait, -o <path>
Reading Output Files
Firecrawl output files can be large. Use incremental reads:
wc -l .firecrawl/file.md && head -50 .firecrawl/file.md
grep -n "keyword" .firecrawl/file.md
Parallelization
Run multiple scrapes in parallel when possible:
firecrawl scrape https://site1.com -o .firecrawl/1.md &
firecrawl scrape https://site2.com -o .firecrawl/2.md &
firecrawl scrape https://site3.com -o .firecrawl/3.md &
wait
Check firecrawl --status for your concurrency limit before parallelizing.