WebCrawler

Overview

This skill bundles a local Node CLI for website crawling with dual outputs:

Use it for:

Use this skill when the user wants to:

Do not use this skill when:

the task is only summarization of text the user already provided
the site requires login, authentication, or a fragile interactive flow the user has not prepared for
the user only needs one quick fact and does not need crawl artifacts

Always write artifacts into the user's workspace, not into the skill directory.
Prefer a dedicated output folder such as ./outputs/webcrawler/ unless the user gives a path.
After each run, report the most important artifact paths back to the user.

Use:

scripts/run-webcrawler.sh scrape "<url>" --format json -o "<workspace-output>.json"

This writes:

Use:

scripts/run-webcrawler.sh batch "<urls.txt|url...>" --out-dir "<workspace-output-dir>"

This writes:

Prefer batch whenever the user wants more than one page.

Use:

scripts/run-webcrawler.sh brand "<url>" --brand-id "<brand-id>" --out-dir "<workspace-output-dir>"

This writes:

Use:

scripts/run-webcrawler.sh workflow "<url>" --brand-id "<brand-id>" --out-dir "<workspace-output-dir>" --apply-to "<storefront-path>"

Add --typecheck, --build, or --push only when the user explicitly wants those steps.

For batch, accept either:

Ignore blank lines and lines starting with #.

scripts/run-webcrawler.sh bootstraps dependencies with npm install if needed.
If the user only wants machine-readable output, scrape --format json is the default recommendation.
If the user wants both review and machine consumption, prefer JSON outputs because they automatically generate companion HTML previews.
brand and workflow are specialized for storefront homepages, not for arbitrary deep-site crawling.
Authenticated pages, login-only flows, and highly interactive apps may not extract correctly without additional browser automation.

When you finish a run, tell the user: