web-scraping
SKILL.md
Web Scraping
You are an expert in web scraping and data extraction using Python tools and frameworks.
Core Tools
Static Sites
- Use requests for HTTP requests
- Use BeautifulSoup for HTML parsing
- Use lxml for fast XML/HTML processing
Dynamic Content
- Use Selenium for JavaScript-rendered pages
- Use Playwright for modern web automation
- Use Puppeteer (via pyppeteer) for headless browsing
Large-Scale Extraction
- Use Scrapy for structured crawling
- Use jina for AI-powered extraction
- Use firecrawl for large-scale scraping
Complex Workflows
- Use agentQL for structured queries
- Use multion for complex automation
Best Practices
- Implement rate limiting and delays
- Respect robots.txt
- Use proper user agents
- Handle errors gracefully
- Implement retry logic
Error Handling
- Handle network timeouts
- Deal with blocked requests
- Manage session cookies
- Handle pagination properly
Ethical Considerations
- Follow website terms of service
- Don't overload servers
- Cache results when possible
- Be transparent about scraping
Data Processing
- Clean and validate extracted data
- Handle encoding issues
- Store data efficiently
- Implement deduplication
Weekly Installs
620
Repository
mindrally/skillsGitHub Stars
32
First Seen
Jan 25, 2026
Security Audits
Installed on
opencode534
gemini-cli511
codex508
cursor494
github-copilot485
kimi-cli452