playwright-scraper-skill

Installation
SKILL.md

Playwright Scraper Skill

A Playwright-based web scraping skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.


🎯 Use Case Matrix

Target Website Anti-Bot Level Recommended Method Script
Regular Sites Low web_fetch tool N/A (built-in)
Dynamic Sites Medium Playwright Simple scripts/playwright-simple.js
Cloudflare Protected High Playwright Stealth ⭐ scripts/playwright-stealth.js

πŸ“¦ Installation

cd playwright-scraper-skill
npm install
npx playwright install chromium

πŸš€ Quick Start

1️⃣ Simple Sites (No Anti-Bot)

Use built-in web_fetch tool for static sites.


2️⃣ Dynamic Sites (Requires JavaScript)

Use Playwright Simple:

node scripts/playwright-simple.js "https://example.com"

3️⃣ Anti-Bot Protected Sites (Cloudflare etc.)

Use Playwright Stealth:

node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"

Features:

  • Hide automation markers (navigator.webdriver = false)
  • Realistic User-Agent (iPhone, Android)
  • Random delays to mimic human behavior
  • Screenshot and HTML saving support

πŸ“– Script Descriptions

scripts/playwright-simple.js

  • Use Case: Regular dynamic websites
  • Speed: Fast (3-5 seconds)
  • Anti-Bot: None
  • Output: JSON (title, content, URL)

scripts/playwright-stealth.js ⭐

  • Use Case: Sites with Cloudflare or anti-bot protection
  • Speed: Medium (5-20 seconds)
  • Anti-Bot: Medium-High (hides automation, realistic UA)
  • Output: JSON + Screenshot + HTML file
  • Verified: 100% success on Discuss.com.hk

πŸ”§ Customization

All scripts support environment variables:

# Set screenshot path
SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL

# Set wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-simple.js URL

# Enable headful mode (show browser)
HEADLESS=false node scripts/playwright-stealth.js URL

# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js URL

# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL

πŸ›‘οΈ Anti-Bot Techniques Summary

βœ… Effective Anti-Bot Measures

  1. Hide navigator.webdriver β€” Essential
  2. Realistic User-Agent β€” Use real devices (iPhone, Android)
  3. Mimic Human Behavior β€” Random delays, scrolling
  4. Avoid Framework Signatures β€” Crawlee, Selenium are easily detected
  5. Use addInitScript (Playwright) β€” Inject before page load

❌ Ineffective Anti-Bot Measures

  1. Only changing User-Agent β€” Not enough
  2. Using high-level frameworks (Crawlee) β€” More easily detected
  3. Docker isolation β€” Doesn't help with Cloudflare

πŸ” Troubleshooting

Issue: 403 Forbidden

Solution: Use playwright-stealth.js

Issue: Cloudflare Challenge Page

Solution:

  1. Increase wait time (10-15 seconds)
  2. Try headless: false (headful mode sometimes has higher success rate)
  3. Consider using proxy IPs

πŸ“š References

Installs
41
GitHub Stars
2
First Seen
Mar 20, 2026