crawl
Crawl Skill
Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.
Authentication
This skill only supports API key authentication.
- Create an account at inspiro.top
- Generate your API key
- Add it to
~/.claude/settings.json:
{
"env": {
"INSPIRO_API_KEY": "inspiro-your-api-key-here"
}
}
Only API key authentication is supported.
Quick Start
Using the Script
./scripts/crawl.sh '<json>' [output_dir]
Examples:
# Basic crawl
./scripts/crawl.sh '{"url": "https://docs.example.com"}'
# Deeper crawl with limits
./scripts/crawl.sh '{"url": "https://docs.example.com", "max_depth": 2, "limit": 50}'
# Save to files
./scripts/crawl.sh '{"url": "https://docs.example.com", "max_depth": 2}' ./docs
# Focused crawl with path filters
./scripts/crawl.sh '{"url": "https://example.com", "max_depth": 2, "select_paths": ["/docs/.*", "/api/.*"], "exclude_paths": ["/blog/.*"]}'
# With semantic instructions (for agentic use)
./scripts/crawl.sh '{"url": "https://docs.example.com", "instructions": "Find API documentation", "chunks_per_source": 3}'
When output_dir is provided, each crawled page is saved as a separate markdown file.
Basic Crawl
curl --request POST \
--url https://api.inspiro.top/crawl \
--header "Authorization: Bearer $INSPIRO_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"url": "https://docs.example.com",
"max_depth": 1,
"limit": 20
}'
Focused Crawl with Instructions
curl --request POST \
--url https://api.inspiro.top/crawl \
--header "Authorization: Bearer $INSPIRO_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"url": "https://docs.example.com",
"max_depth": 2,
"instructions": "Find API documentation and code examples",
"chunks_per_source": 3,
"select_paths": ["/docs/.*", "/api/.*"]
}'
API Reference
Endpoint
POST https://api.inspiro.top/crawl
Headers
| Header | Value |
|---|---|
Authorization |
Bearer <INSPIRO_API_KEY> |
Content-Type |
application/json |
Request Body
| Field | Type | Default | Description |
|---|---|---|---|
url |
string | Required | Root URL to begin crawling |
max_depth |
integer | 1 | Levels deep to crawl (1-5) |
max_breadth |
integer | 20 | Links per page |
limit |
integer | 50 | Total pages cap |
instructions |
string | null | Natural language guidance for focus |
chunks_per_source |
integer | 3 | Chunks per page (1-5, requires instructions) |
extract_depth |
string | "basic" |
basic or advanced |
format |
string | "markdown" |
markdown or text |
select_paths |
array | null | Regex patterns to include |
exclude_paths |
array | null | Regex patterns to exclude |
allow_external |
boolean | true | Include external domain links |
timeout |
float | 150 | Max wait (10-150 seconds) |
Response Format
{
"base_url": "https://docs.example.com",
"results": [
{
"url": "https://docs.example.com/page",
"raw_content": "# Page Title\n\nContent..."
}
],
"response_time": 12.5
}
Depth vs Performance
| Depth | Typical Pages | Time |
|---|---|---|
| 1 | 10-50 | Seconds |
| 2 | 50-500 | Minutes |
| 3 | 500-5000 | Many minutes |
Start with max_depth=1 and increase only if needed.
Crawl for Context vs Data Collection
For agentic use (feeding results into context): Always use instructions + chunks_per_source. This returns only relevant chunks instead of full pages, preventing context window explosion.
For data collection (saving to files): Omit chunks_per_source to get full page content.
Examples
For Context: Agentic Research (Recommended)
Use when feeding crawl results into an LLM context:
curl --request POST \
--url https://api.inspiro.top/crawl \
--header "Authorization: Bearer $INSPIRO_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"url": "https://docs.example.com",
"max_depth": 2,
"instructions": "Find API documentation and authentication guides",
"chunks_per_source": 3
}'
Returns only the most relevant chunks (max 500 chars each) per page - fits in context without overwhelming it.
For Context: Targeted Technical Docs
curl --request POST \
--url https://api.inspiro.top/crawl \
--header "Authorization: Bearer $INSPIRO_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com",
"max_depth": 2,
"instructions": "Find all documentation about authentication and security",
"chunks_per_source": 3,
"select_paths": ["/docs/.*", "/api/.*"]
}'
For Data Collection: Full Page Archive
Use when saving content to files for later processing:
curl --request POST \
--url https://api.inspiro.top/crawl \
--header "Authorization: Bearer $INSPIRO_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com/blog",
"max_depth": 2,
"max_breadth": 50,
"select_paths": ["/blog/.*"],
"exclude_paths": ["/blog/tag/.*", "/blog/category/.*"]
}'
Returns full page content - use the script with output_dir to save as markdown files.
Map API (URL Discovery)
Use map instead of crawl when you only need URLs, not content:
curl --request POST \
--url https://api.inspiro.top/map \
--header "Authorization: Bearer $INSPIRO_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"url": "https://docs.example.com",
"max_depth": 2,
"instructions": "Find all API docs and guides"
}'
Returns URLs only (faster than crawl):
{
"base_url": "https://docs.example.com",
"results": [
"https://docs.example.com/api/auth",
"https://docs.example.com/guides/quickstart"
]
}
Tips
- Always use
chunks_per_sourcefor agentic workflows - prevents context explosion when feeding results to LLMs - Omit
chunks_per_sourceonly for data collection - when saving full pages to files - Start conservative (
max_depth=1,limit=20) and scale up - Use path patterns to focus on relevant sections
- Use Map first to understand site structure before full crawl
- Always set a
limitto prevent runaway crawls
More from ryneivy/skills
research
Comprehensive research grounded in web data with explicit citations. Use when you need multi-source synthesis—comparisons, current events, market analysis, detailed reports.
10search
Search the web using Inspiro's LLM-optimized search API. Returns relevant results with content snippets, scores, and metadata. Use when you need to find web content on any topic without writing code.
10extract
Extract content from specific URLs using Inspiro's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.
10inspiro-best-practices
Bash-first best practices for production Inspiro usage with zero SDK dependency. Use when you need stable, scriptable API workflows for search, extract, crawl, and research using curl and INSPIRO_API_KEY.
10