web-automation
Web Automation
Comprehensive web automation covering browser automation, web scraping, and workflow automation tools.
Quick Decision Guide
Need browser automation?
+-- Modern testing/scraping --> Playwright (recommended)
+-- Chrome-only, PDF/screenshots --> Puppeteer
+-- Legacy/cross-browser --> Selenium
+-- Serverless/API-based --> Browserless
Need data scraping?
+-- Large-scale crawling --> Scrapy
+-- Dynamic content (JS) --> Playwright
+-- Simple HTML --> BeautifulSoup
Need workflow automation?
+-- Visual workflows --> n8n
Playwright (Recommended)
Installation
npm init playwright@latest
# or
pip install playwright
playwright install
Basic Example
import { chromium } from "playwright";
async function scrape() {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto("https://example.com");
await page.waitForSelector(".content");
const title = await page.textContent("h1");
const links = await page.$$eval("a", (els) =>
els.map((el) => ({ text: el.textContent, href: el.href }))
);
await browser.close();
return { title, links };
}
Common Patterns
// Screenshot
await page.screenshot({ path: "screenshot.png", fullPage: true });
// PDF generation
await page.pdf({ path: "page.pdf", format: "A4" });
// Fill forms
await page.fill('input[name="email"]', "user@example.com");
await page.click('button[type="submit"]');
// Wait for navigation
await Promise.all([page.waitForNavigation(), page.click("a.next-page")]);
// Handle dialogs
page.on("dialog", (dialog) => dialog.accept());
Puppeteer
Installation
npm install puppeteer
Basic Example
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({ headless: "new" });
const page = await browser.newPage();
await page.goto("https://example.com");
await page.screenshot({ path: "example.png" });
await browser.close();
})();
Scrapy (Python)
Installation
pip install scrapy
scrapy startproject myproject
Spider Example
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'quotes'
start_urls = ['https://quotes.toscrape.com']
def parse(self, response):
for quote in response.css('div.quote'):
yield {
'text': quote.css('span.text::text').get(),
'author': quote.css('small.author::text').get(),
'tags': quote.css('div.tags a.tag::text').getall(),
}
# Follow pagination
next_page = response.css('li.next a::attr(href)').get()
if next_page:
yield response.follow(next_page, self.parse)
Best Practices
Rate Limiting
// Add delays between requests
await page.waitForTimeout(1000 + Math.random() * 2000);
User Agent Rotation
const userAgents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...",
];
await page.setUserAgent(
userAgents[Math.floor(Math.random() * userAgents.length)]
);
Error Handling
try {
await page.goto(url, { timeout: 30000 });
} catch (error) {
if (error.name === "TimeoutError") {
console.log("Page load timeout, retrying...");
await page.goto(url, { timeout: 60000 });
}
}
Respectful Scraping
- Check
robots.txtbefore scraping - Add reasonable delays between requests
- Identify your bot with a custom User-Agent
- Cache responses to avoid repeated requests
- Respect rate limits and Terms of Service
When to Use This Skill
- Automating browser interactions for testing
- Scraping data from websites
- Generating PDFs or screenshots
- Building web crawlers
- Creating workflow automations
- Monitoring website changes
Tool Comparison
| Tool | Strengths | Best For |
|---|---|---|
| Playwright | Cross-browser, modern API | E2E testing, SPA scraping |
| Puppeteer | Chrome-focused, mature | PDF generation, screenshots |
| Selenium | Wide browser support | Legacy systems, cross-browser |
| Scrapy | High performance, Python | Large-scale crawling |
| Browserless | Serverless, scalable | Cloud automation |
More from housegarofalo/claude-code-base
mqtt-iot
Configure MQTT brokers (Mosquitto, EMQX) for IoT messaging, device communication, and smart home integration. Manage topics, QoS levels, authentication, and bridging. Use when setting up IoT messaging, smart home communication, or device-to-cloud connectivity. (project)
22devops-engineer-agent
Infrastructure and DevOps specialist. Manages Docker, Kubernetes, CI/CD pipelines, and cloud deployments. Expert in GitHub Actions, Azure DevOps, Terraform, and container orchestration. Use for deployment automation, infrastructure setup, or CI/CD optimization.
6postgresql
Design, optimize, and manage PostgreSQL databases. Covers indexing, pgvector for AI embeddings, JSON operations, full-text search, and query optimization. Use when working with PostgreSQL, database design, or building data-intensive applications.
6home-assistant
Ultimate Home Assistant skill - complete administration, wireless protocols (Zigbee/ZHA/Z2M, Z-Wave JS, Thread, Matter), ESPHome device building, advanced troubleshooting, performance optimization, security hardening, custom integration development, and professional dashboard design. Covers configuration, REST API, automation debugging, database optimization, SSL/TLS, Jinja2 templating, and HACS custom cards. Use for any HA task.
6testing
Comprehensive testing skill covering unit, integration, and E2E testing with pytest, Jest, Cypress, and Playwright. Use for writing tests, improving coverage, debugging test failures, and setting up testing infrastructure.
5react-typescript
Build modern React applications with TypeScript. Covers React 18+ patterns, hooks, component architecture, state management (Zustand, Redux Toolkit), server components, and best practices. Use for React development, TypeScript integration, component design, and frontend architecture.
5