skills/skills.volces.com/Web Scraping & Data Extraction Engine

Web Scraping & Data Extraction Engine

SKILL.md

Web Scraping & Data Extraction Engine

Quick Health Check (Run First)

Score your scraping operation (2 points each):

Signal Healthy Unhealthy
Legal compliance robots.txt checked, ToS reviewed Scraping blindly
Architecture Tool matches site complexity Using Puppeteer for static HTML
Anti-detection Rotation, delays, fingerprint diversity Single IP, no delays
Data quality Validation + dedup pipeline Raw dumps, no cleaning
Error handling Retry logic, circuit breakers Crashes on first 403
Monitoring Success rates tracked, alerts set No visibility
Storage Structured, deduplicated, versioned Flat files, duplicates
Scheduling Appropriate frequency, off-peak Hammering during business hours

Score: /16 → 12+: Production-ready | 8-11: Needs work | <8: Stop and redesign

Installs
6
First Seen
Apr 7, 2026