@1247/web-crawler
Installation
SKILL.md
Web Crawler
Playwright-based headless browser for crawling JavaScript-heavy sites, taking screenshots, clicking buttons, and running UX audits. Works on SPAs, auth-walled pages, and anything that needs a real browser.
Quick Reference
Single page screenshot
python3 skills/web-crawler/scripts/crawl.py https://example.com
Full UX audit (crawls all internal pages)
python3 skills/web-crawler/scripts/crawl.py https://example.com --audit --max-pages 30
Click a button and screenshot result
python3 skills/web-crawler/scripts/crawl.py https://example.com --click "button.cta"
Click all nav links sequentially
python3 skills/web-crawler/scripts/crawl.py https://example.com --click-all "nav a"
Extract all text (good for SPAs)
python3 skills/web-crawler/scripts/crawl.py https://example.com --extract-text
Output Structure
output/crawl-{domain}/
├── screenshots/ # PNG screenshots per page
├── pages/ # Extracted text per page
├── report.json # Structured crawl data
└── audit.md # Readable audit report (--audit mode)
Audit Mode
--audit runs a breadth-first crawl of all internal pages:
- Screenshots every page
- Extracts all links (internal + external)
- Captures console errors
- Runs accessibility checks (missing alt, unlabeled buttons, missing h1/lang)
- Maps all clickable elements (buttons, links, interactive elements)
- Generates
audit.mdwith findings
Key Options
| Flag | Default | What it does |
|---|---|---|
--audit |
off | Full site audit mode |
--max-pages N |
20 | Max pages to crawl in audit |
--full-page |
off | Full-page screenshots vs viewport |
--click SEL |
— | Click CSS selector, screenshot result |
--click-all SEL |
— | Click all matching elements |
--extract-links |
off | Extract all <a> hrefs |
--extract-text |
off | Extract visible text |
--console-log |
off | Capture console messages |
--wait SEC |
2 | Wait after page load |
--viewport WxH |
1440x900 | Browser viewport |
--timeout SEC |
30 | Navigation timeout |
Workflow for UX Audit
- Run
--auditon the target site - Read
audit.mdfor the summary - Check
report.jsonfor structured data (clickables, links, console errors) - Review screenshots in
screenshots/folder - For deeper inspection: re-run single pages with
--clickto test specific interactions - Synthesize findings into actionable UX recommendations
Gotchas
- SPAs with hash routing: The crawler follows
<a href>links. Hash-only routes (#/page) won't be auto-discovered in audit mode. Use--click-all "nav a"instead. - Auth-walled sites: Not yet supported. Future: add
--cookiesupport. - Rate limiting: The crawler hits pages fast. Add
--wait 3for sensitive sites. - Chromium install: First run needs
playwright install chromium(~110MB). Persisted in setup.sh.