crawl4ai
SKILL.md
crawl4ai
High-performance web crawler with intelligent chunking. Crawls web pages and extracts content as markdown using LLM-based skeleton planning.
Commands
crawl_url (alias: webCrawl)
Crawl a web page with native workflow execution and LLM-based intelligent chunking.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | - | Target URL to crawl (required) |
action |
str | "smart" | Action mode: "smart", "skeleton", "crawl" |
fit_markdown |
bool | true | Clean and simplify markdown output |
max_depth |
int | 0 | Maximum crawling depth (0=single page) |
return_skeleton |
bool | false | Also return document skeleton (TOC) |
chunk_indices |
list[int] | - | List of section indices to extract |
Action Modes:
| Mode | Description | Use Case |
|---|---|---|
smart (default) |
LLM generates chunk plan, then extracts relevant sections | Large docs where you need specific info |
skeleton |
Extract lightweight TOC without full content | Quick overview, decide what to read |
crawl |
Return full markdown content | Small pages, complete content needed |
Runtime Transport:
max_depth = 0: Uses HTTP strategy (no browser cold-start) for lower latency.max_depth > 0: Uses browser deep-crawl strategy (BFS) for multi-page traversal.file://...withmax_depth = 0: Uses local fast-path (no crawl4ai runtime bootstrap) for deterministic fixture/local-note benchmarking.- Persistent worker mode reuses the HTTP crawler instance across requests to reduce repeated initialization cost.
Examples:
# Smart crawl with LLM chunking (default)
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com"})
# Skeleton only - get TOC quickly
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "action": "skeleton"})
# Full content crawl
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "action": "crawl"})
# Extract specific sections
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "chunk_indices": [0, 1, 2]})
# Deep crawl (follow links up to depth N)
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "max_depth": 2})
# Get skeleton with full content
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "return_skeleton": true})
Core Concepts
| Topic | Description | Reference |
|---|---|---|
| Skeleton Planning | LLM sees TOC (~500 tokens) not full content (~10k+) | smart-chunking.md |
| Chunk Extraction | Token-aware section extraction | chunking.md |
| Deep Crawling | Multi-page crawling with BFS strategy | deep-crawl.md |
Best Practices
- Use
skeletonmode first for large documents to understand structure - Use
chunk_indicesto extract specific sections instead of full content - Set
max_depth> 0 carefully - limits pages crawled to prevent runaway crawling - Keep
fit_markdown=truefor cleaner output, false for raw content
Advanced
- Batch multiple URLs with separate calls
- Combine with knowledge tools for RAG pipelines
- Use skeleton + LLM to auto-generate chunk plans for custom extraction
Weekly Installs
44
Repository
tao3k/omni-dev-fusionGitHub Stars
9
First Seen
Jan 23, 2026
Security Audits
Installed on
gemini-cli39
opencode38
github-copilot35
codex35
kimi-cli27
amp27