doc-scraper
SKILL.md
Snowflake Documentation Scraper
Scrapes docs.snowflake.com sections to Markdown with SQLite caching (7-day expiration).
Usage
First time setup (auto-installs uv and doc-scraper):
python3 .claude/skills/doc-scraper/scripts/doc_scraper.py
Subsequent runs:
doc-scraper --output-dir=./snowflake-docs
doc-scraper --output-dir=./snowflake-docs --base-path="/en/sql-reference/"
doc-scraper --output-dir=./snowflake-docs --spider-depth=2
Command Options
| Option | Default | Description |
|---|---|---|
--output-dir |
Required | Output directory for scraped docs |
--base-path |
/en/migrations/ |
URL section to scrape |
--spider-depth |
1 |
Link depth: 0=seeds, 1=+links, 2=+2nd |
--limit |
None | Cap URLs (for testing) |
--dry-run |
- | Preview without writing |
Output
output-dir/
├── SKILL.md # Auto-generated index
├── scraper_config.yaml # Editable config (auto-created)
├── .cache/ # SQLite cache (auto-managed)
└── en/migrations/*.md # Scraped pages with frontmatter
Configuration
Auto-created at {output-dir}/scraper_config.yaml:
rate_limiting:
max_concurrent_threads: 4
spider:
max_pages: 1000
allowed_paths: ["/en/"]
scraped_pages:
expiration_days: 7
Troubleshooting
| Issue | Solution |
|---|---|
| Too many pages | Lower --spider-depth or edit config |
| Missing pages | Increase --spider-depth |
| Cache corruption | Delete {output-dir}/.cache/ (rare) |
Weekly Installs
1
Repository
sfc-gh-dflippo/…dbt-demoGitHub Stars
31
First Seen
4 days ago
Security Audits
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1