crawl4ai

High-performance web crawler with intelligent chunking. Crawls web pages and extracts content as markdown using LLM-based skeleton planning.

Commands

`crawl_url` (alias: `webCrawl`)

Crawl a web page with native workflow execution and LLM-based intelligent chunking.

Parameters:

Parameter	Type	Default	Description
`url`	str	-	Target URL to crawl (required)
`action`	str	"smart"	Action mode: "smart", "skeleton", "crawl"
`fit_markdown`	bool	true	Clean and simplify markdown output
`max_depth`	int	0	Maximum crawling depth (0=single page)
`return_skeleton`	bool	false	Also return document skeleton (TOC)
`chunk_indices`	list[int]	-	List of section indices to extract

Action Modes:

Mode	Description	Use Case
`smart` (default)	LLM generates chunk plan, then extracts relevant sections	Large docs where you need specific info
`skeleton`	Extract lightweight TOC without full content	Quick overview, decide what to read
`crawl`	Return full markdown content	Small pages, complete content needed

Runtime Transport:

max_depth = 0: Uses HTTP strategy (no browser cold-start) for lower latency.
max_depth > 0: Uses browser deep-crawl strategy (BFS) for multi-page traversal.
file://... with max_depth = 0: Uses local fast-path (no crawl4ai runtime bootstrap) for deterministic fixture/local-note benchmarking.
Persistent worker mode reuses the HTTP crawler instance across requests to reduce repeated initialization cost.

Examples:

# Smart crawl with LLM chunking (default)
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com"}`

# Skeleton only - get TOC quickly
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "action": "skeleton"}`

# Full content crawl
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "action": "crawl"}`

# Extract specific sections
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "chunk_indices": [0, 1, 2]}`

# Deep crawl (follow links up to depth N)
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "max_depth": 2}`

# Get skeleton with full content
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "return_skeleton": true}`

Core Concepts

Topic	Description	Reference
Skeleton Planning	LLM sees TOC (~500 tokens) not full content (~10k+)	smart-chunking.md
Chunk Extraction	Token-aware section extraction	chunking.md
Deep Crawling	Multi-page crawling with BFS strategy	deep-crawl.md

Best Practices

Use skeleton mode first for large documents to understand structure
Use chunk_indices to extract specific sections instead of full content
Set max_depth > 0 carefully - limits pages crawled to prevent runaway crawling
Keep fit_markdown=true for cleaner output, false for raw content

Advanced

Batch multiple URLs with separate calls
Combine with knowledge tools for RAG pipelines
Use skeleton + LLM to auto-generate chunk plans for custom extraction

crawl4ai

crawl4ai

Commands

`crawl_url` (alias: `webCrawl`)

Core Concepts

Best Practices

Advanced

More from tao3k/omni-dev-fusion

software_engineering

python_engineering

code_tools

rust_engineering

git

researcher

crawl4ai

crawl4ai

Commands

crawl_url (alias: webCrawl)

Core Concepts

Best Practices

Advanced

More from tao3k/omni-dev-fusion

software_engineering

python_engineering

code_tools

rust_engineering

git

researcher

`crawl_url` (alias: `webCrawl`)