BrightData
Customization
Before executing, check for user customizations at:
~/.claude/PAI/USER/SKILLCUSTOMIZATIONS/BrightData/
If this directory exists, load and apply any PREFERENCES.md, configurations, or resources found there. These override default behavior. If the directory does not exist, proceed with skill defaults.
🚨 MANDATORY: Voice Notification (REQUIRED BEFORE ANY ACTION)
You MUST send this notification BEFORE doing anything else when this skill is invoked.
-
Send voice notification:
curl -s -X POST http://localhost:31337/notify \ -H "Content-Type: application/json" \ -d '{"message": "Running the WORKFLOWNAME workflow in the BrightData skill to ACTION"}' \ > /dev/null 2>&1 & -
Output text notification:
Running the **WorkflowName** workflow in the **BrightData** skill to ACTION...
This is not optional. Execute this curl command immediately upon skill invocation.
Workflow Routing
When executing a workflow, output this notification directly:
Running the **WorkflowName** workflow in the **Brightdata** skill to ACTION...
Route to the appropriate workflow based on the request.
When user requests scraping/fetching a single URL: Examples: "scrape this URL", "fetch this page", "get content from [URL]", "pull content from this site", "retrieve [URL]", "can't access this site", "this site is blocking me", "use Bright Data to fetch" → READ: Workflows/FourTierScrape.md → EXECUTE: Four-tier progressive scraping workflow (WebFetch → Curl → Browser Automation → Bright Data MCP)
When user requests crawling multiple pages from a site: Examples: "crawl this site", "crawl all pages under /docs", "spider this domain", "map this website", "get all pages from", "crawl [URL]", "scrape the whole site", "extract all pages" → READ: Workflows/Crawl.md → EXECUTE: Crawl workflow (Light Crawl for <50 pages, Full Crawl via Bright Data Crawl API for larger sites)
When to Activate This Skill
Direct Scraping Requests (Categories 1-4)
- "scrape this URL", "scrape [URL]", "scrape this page"
- "fetch this URL", "fetch [URL]", "fetch this page", "fetch content from"
- "pull content from [URL]", "pull this page", "pull from this site"
- "get content from [URL]", "retrieve [URL]", "retrieve this page"
- "do scraping on [URL]", "run scraper on [URL]"
- "basic scrape", "quick scrape", "simple fetch"
- "comprehensive scrape", "deep scrape", "full content extraction"
Access & Bot Detection Issues (Categories 5-7)
- "can't access this site", "site is blocking me", "getting blocked"
- "bot detection", "CAPTCHA", "access denied", "403 error"
- "need to bypass bot detection", "get around blocking"
- "this URL won't load", "can't fetch this page"
- "use Bright Data", "use the scraper", "use advanced scraping"
Result-Oriented Requests (Category 8)
- "get me the content from [URL]"
- "extract text from [URL]"
- "download this page content"
- "convert [URL] to markdown"
- "need the HTML from this site"
Crawling Requests (Categories 9-11)
- "crawl this site", "crawl [URL]", "spider this domain"
- "map this website", "get all pages from [URL]", "scrape the whole site"
- "crawl all pages under /docs", "extract all pages from", "site crawl"
- "get every page on this site", "full site extraction"
- "crawl depth 3", "crawl up to 50 pages"
Use Case Indicators
- User needs web content for research or analysis
- Standard methods (WebFetch) are failing
- Site has bot detection or rate limiting
- Need reliable content extraction
- Converting web pages to structured format (markdown)
- User needs multiple pages from a site, not just one
- User wants to map a site's structure or extract a section
Core Capabilities
Progressive Escalation Strategy:
- Tier 1: WebFetch - Fast, simple, built-in Claude Code tool
- Tier 2: Customized Curl - Chrome-like browser headers to bypass basic bot detection
- Tier 3: agent-browser - Headless browser automation via agent-browser Rust CLI daemon for JavaScript-heavy sites. Playwright is banned across PAI.
- Tier 4: Bright Data MCP - Professional scraping service that handles CAPTCHA and advanced bot detection
Key Features:
- Automatic fallback between tiers
- Preserves content in markdown format
- Handles bot detection and CAPTCHA
- Works with any URL
- Efficient resource usage (only escalates when needed)
Workflow Overview
FourTierScrape.md - Complete URL content scraping with four-tier fallback strategy
- When to use: Any single URL content retrieval request
- Process: Start with WebFetch → If fails, use curl with Chrome headers → If fails, use Browser Automation → If fails, use Bright Data MCP
- Output: URL content in markdown format
Crawl.md - Multi-page crawling with link discovery and site mapping
- When to use: Crawling multiple pages from a site, mapping site structure, extracting a section
- Process: Light Crawl (MCP scrape_batch + link extraction loop, up to 50 pages) or Full Crawl (Bright Data Crawl API for entire sites)
- Output: Site map + page contents in markdown, with crawl stats and cost summary
Extended Context
Integration Points:
- WebFetch Tool - Built-in Claude Code tool for basic URL fetching
- Bash Tool - For executing curl commands with custom headers
- Browser Automation - agent-browser headless daemon for JavaScript rendering
- Bright Data MCP -
mcp__Brightdata__scrape_as_markdownandscrape_batchfor advanced scraping - Bright Data Crawl API - HTTP POST to
api.brightdata.com/datasets/v3/triggerfor full-site crawls
When Each Tier Is Used:
- Tier 1 (WebFetch): Simple sites, public content, no bot detection
- Tier 2 (Curl): Sites with basic user-agent checking, simple bot detection
- Tier 3 (agent-browser): Sites requiring JavaScript execution, dynamic content loading
- Tier 4 (Bright Data): Sites with CAPTCHA, advanced bot detection, residential proxy requirements
Configuration: No configuration required - all tools are available by default in Claude Code
Examples
Example 1: Simple Public Website
User: "Scrape https://example.com"
Skill Response:
- Routes to three-tier-scrape.md
- Attempts Tier 1 (WebFetch)
- Success → Returns content in markdown
- Total time: <5 seconds
Example 2: Site with JavaScript Requirements
User: "Can't access this site https://dynamic-site.com"
Skill Response:
- Routes to four-tier-scrape.md
- Attempts Tier 1 (WebFetch) → Fails (blocked)
- Attempts Tier 2 (Curl with Chrome headers) → Fails (JavaScript required)
- Attempts Tier 3 (agent-browser) → Success
- Returns content in markdown
- Total time: ~15-20 seconds
Example 3: Site with Advanced Bot Detection
User: "Scrape https://protected-site.com"
Skill Response:
- Routes to four-tier-scrape.md
- Attempts Tier 1 (WebFetch) → Fails (blocked)
- Attempts Tier 2 (Curl) → Fails (advanced detection)
- Attempts Tier 3 (agent-browser) → Fails (CAPTCHA)
- Attempts Tier 4 (Bright Data MCP) → Success
- Returns content in markdown
- Total time: ~30-40 seconds
Example 4: Explicit Bright Data Request
User: "Use Bright Data to fetch https://difficult-site.com"
Skill Response:
- Routes to four-tier-scrape.md
- User explicitly requested Bright Data
- Goes directly to Tier 4 (Bright Data MCP) → Success
- Returns content in markdown
- Total time: ~5-10 seconds
Related Documentation:
~/.claude/PAI/DOCUMENTATION/Skills/SkillSystem.md- Canonical structure guide~/.claude/- Overall PAI philosophy
Last Updated: 2026-02-22
Gotchas
- 4-tier escalation: WebFetch → curl → agent-browser → Bright Data proxy. Always start at Tier 1 and escalate only when blocked. Playwright is banned across PAI.
- Bright Data proxy has usage costs. Don't use Tier 4 for sites accessible via Tier 1-3.
- CAPTCHA-solving introduces latency. Allow extra time for Tier 4 responses.
- Credentials in
~/.claude/.env— BRIGHTDATA_API_KEY.
Execution Log
After completing any workflow, append a single JSONL entry:
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","skill":"BrightData","workflow":"WORKFLOW_USED","input":"8_WORD_SUMMARY","status":"ok|error","duration_s":SECONDS}' >> ~/.claude/PAI/MEMORY/SKILLS/execution.jsonl
Replace WORKFLOW_USED with the workflow executed, 8_WORD_SUMMARY with a brief input description, and SECONDS with approximate wall-clock time. Log status: "error" if the workflow failed.
More from danielmiessler/personal_ai_infrastructure
osint
Structured OSINT investigations — people lookup, company intel, investment due diligence, entity/threat intel, domain recon, organization research using public sources with ethical authorization framework. USE WHEN OSINT, due diligence, background check, research person, company intel, investigate, company lookup, domain lookup, entity lookup, organization lookup, threat intel, discover OSINT sources.
259firstprinciples
Physics-based reasoning framework (Musk/Elon methodology) that deconstructs problems to irreducible fundamental truths rather than reasoning by analogy. Three-step structure: DECONSTRUCT (break to constituent parts and actual values), CHALLENGE (classify every element as hard constraint / soft constraint / unvalidated assumption — only physics is truly immutable), RECONSTRUCT (build optimal solution from fundamentals alone, ignoring inherited form). Outputs: constituent-parts breakdown, constraint classification table, and reconstructed solution with key insight. Three workflows: Deconstruct.md, Challenge.md, Reconstruct.md. Integrates with RedTeam (attack assumptions before deploying adversarial agents), Security (decompose threat model), Architecture (challenge design constraints), and Pentesters (decompose assumed security boundaries). Other skills invoke via: Challenge on all stated constraints → classify as hard/soft/assumption. Cross-domain synthesis: solutions from unrelated fields often apply once the fundamental truths are exposed. NOT FOR incident investigation and causal chains (use RootCauseAnalysis). NOT FOR structural feedback loops (use SystemsThinking). USE WHEN first principles, fundamental truths, challenge assumptions, is this a real constraint, rebuild from scratch, what are we actually paying for, what is this really made of, start over, physics first, question everything, reasoning by analogy, is this really necessary.
160documents
Read, write, convert, and analyze documents — routes to PDF, DOCX, XLSX, PPTX sub-skills for creation, editing, extraction, and format conversion. USE WHEN document, process file, create document, convert format, extract text, PDF, DOCX, XLSX, PPTX, Word, Excel, spreadsheet, PowerPoint, presentation, slides, consulting report, large PDF, merge PDF, fill form, tracked changes, redlining.
114council
Multi-agent collaborative debate that produces visible round-by-round transcripts with genuine intellectual friction. All council members are custom-composed via ComposeAgent (Agents skill) with domain expertise, unique voice, and personality tailored to the specific topic — never built-in generic types. ComposeAgent invoked as: bun run ~/.claude/skills/Agents/Tools/ComposeAgent.ts. Two workflows: DEBATE (3 rounds, full transcript + synthesis, parallel execution within rounds, 40-90 seconds total) and QUICK (1 round, fast perspective check). Context files: CouncilMembers.md (agent composition instructions), RoundStructure.md (three-round structure and timing), OutputFormat.md (transcript format templates). Agents are designed per debate topic to create real disagreement; 4-6 well-composed agents outperform 12 generic ones. Council is collaborative-adversarial (debate to find best path); for pure adversarial attack on an idea, use RedTeam instead. NOT FOR parallel task execution across agents (use Delegation skill). USE WHEN council, debate, multiple perspectives, weigh options, deliberate, get different views, multi-agent discussion, what would experts say, is there consensus, pros and cons from multiple angles.
112privateinvestigator
Ethical people-finding using 15 parallel research agents (45 search threads) across public records, social media, reverse lookups. Public data only, no pretexting. USE WHEN find person, locate, reconnect, people search, skip trace, reverse lookup, social media search, public records search, verify identity.
112redteam
Military-grade adversarial analysis that deploys 32 parallel expert agents (engineers, architects, pentesters, interns) to stress-test ideas, strategies, and plans — not systems or infrastructure. Two workflows: ParallelAnalysis (5-phase: decompose into 24 atomic claims → 32-agent parallel attack → synthesis → steelman → counter-argument, each 8 points) and AdversarialValidation (competing proposals synthesized into best solution). Context files: Philosophy.md (core principles, success criteria, agent types), Integration.md (how to combine with FirstPrinciples, Council, and other skills; output format). Targets arguments, not network vulnerabilities. Findings ranked by severity; goal is to strengthen, not destroy — weaknesses delivered with remediation paths. Collaborates with FirstPrinciples (decompose assumptions before attacking) and Council (Council debates to find paths; RedTeam attacks whatever survives). Also invoked internally by Ideate (TEST phase) and WorldThreatModel (horizon stress-testing). NOT FOR AI instruction set auditing (use BitterPillEngineering). NOT FOR network/system vulnerability testing (use a security assessment skill). USE WHEN red team, attack idea, counterarguments, critique, stress test, devil's advocate, find weaknesses, break this, poke holes, what could go wrong, strongest objection, adversarial validation, battle of bots.
112