website-crawler
Website Crawler
High-performance web crawler with TypeScript/Bun frontend and Go backend for discovering and mapping website structure.
When to Use
Use this skill when users ask to:
- Crawl a website or "spider a site"
- Map site structure or "discover all pages"
- Find all URLs on a website
- Generate sitemap or site report
- Analyze link relationships between pages
- Audit website coverage or completeness
- Extract page metadata (titles, status codes)
Keywords: crawl, spider, map, discover pages, site structure, sitemap, all URLs, website audit
Quick Start
Run the crawler from the scripts directory:
cd ~/.claude/scripts/crawler
bun src/index.ts <URL> [options]
CLI Options
| Option | Short | Default | Description |
|---|---|---|---|
--depth |
-D |
2 | Maximum crawl depth |
--workers |
-w |
20 | Concurrent workers |
--rate |
-r |
2 | Rate limit (requests/second) |
--profile |
-p |
- | Use preset profile (fast/deep/gentle) |
--output |
-o |
auto | Output directory |
--sitemap |
-s |
true | Use sitemap.xml for discovery |
--domain |
-d |
auto | Allowed domain (extracted from URL) |
--debug |
- | false | Enable debug logging |
Profiles
Three preset profiles for common use cases:
| Profile | Workers | Depth | Rate | Use Case |
|---|---|---|---|---|
fast |
50 | 3 | 10 | Quick site mapping |
deep |
20 | 10 | 3 | Thorough crawling |
gentle |
5 | 5 | 1 | Respect server limits |
Usage Examples
Basic crawl
bun src/index.ts https://example.com
Deep crawl with high concurrency
bun src/index.ts https://example.com --depth 5 --workers 30 --rate 5
Using a profile
bun src/index.ts https://example.com --profile fast
Gentle crawl (avoid rate limiting)
bun src/index.ts https://example.com --profile gentle
Output
The crawler generates two files in the output directory:
- results.json - Structured crawl data with all discovered pages
- index.html - Dark-themed HTML report with statistics
Results JSON Structure
{
"stats": {
"pages_found": 150,
"pages_crawled": 147,
"external_links": 23,
"errors": 3,
"duration": 45.2
},
"results": [
{
"url": "https://example.com/page",
"title": "Page Title",
"status_code": 200,
"depth": 1,
"links": ["..."],
"content_type": "text/html"
}
]
}
Features
- Sitemap Discovery: Automatically finds and parses sitemap.xml
- Checkpoint/Resume: Auto-saves progress every 30 seconds
- Rate Limiting: Token bucket algorithm prevents server overload
- Concurrent Crawling: Go worker pool for high performance
- HTML Reports: Dark-themed, mobile-responsive reports
Troubleshooting
Rate limiting errors
Reduce the rate limit or use the gentle profile:
bun src/index.ts <url> --rate 1
# or
bun src/index.ts <url> --profile gentle
Go binary not found
The TypeScript frontend auto-compiles the Go binary. If compilation fails:
cd ~/.claude/scripts/crawler/engine
go build -o crawler main.go
Timeout on large sites
Reduce depth or increase workers:
bun src/index.ts <url> --depth 1 --workers 50
Architecture
For detailed architecture, Go engine specifications, and code conventions, see reference.md.
Related Files
- Command:
plugins/crawler/commands/crawler.md - Reference:
plugins/crawler/skills/website-crawler/reference.md - Scripts:
plugins/crawler/skills/website-crawler/scripts/ - Profiles:
plugins/crawler/skills/website-crawler/scripts/config/profiles/
More from leobrival/serum-plugins-official
web-crawler
High-performance Rust web crawler with stealth mode, LLM-ready Markdown export, multi-format output, sitemap discovery, and robots.txt support. Optimized for content extraction, site mapping, structure analysis, and LLM/RAG pipelines.
14media-processor
Universal media processor that auto-detects and processes images or videos with appropriate tools. Use when users ask to process media files, batch convert media, compress mixed folders, auto-detect file types, or handle both images and videos together.
3image-processing
Smart image processing for resize, compress, and convert operations with batch support. Use when users ask to resize images, compress photos, convert to WebP/AVIF, batch process images, optimize for social media, or create image thumbnails. Supports modern formats and aspect ratios.
3video-processing
Smart video processing for compress, convert, trim, and audio extraction with batch support. Use when users ask to compress videos, convert to MP4/WebM, extract video clips, trim videos, remove audio, extract audio tracks, or batch process video files. Supports modern codecs and social media presets.
3gif-creation
Create optimized GIFs from videos or image sequences using 2-pass palette generation. Use when users ask to create GIF, video to GIF, optimize GIF, animated GIF, GIF from images, or reduce GIF file size. Supports quality presets and custom timing.
3audit-methodology
Comprehensive audit methodology for web applications covering accessibility (RGAA 4.1), security (OWASP Top 10), performance (Core Web Vitals), and eco-design. Use when users need guidance on audit processes, testing methodologies, compliance standards, or audit best practices. Includes detailed reference documentation for each audit domain.
3