spider-cli-extraction
SKILL.md
Spider CLI Extraction
Overview
Use this skill to run Spider CLI workflows with explicit runtime mode control.
Canonical source for cross-agent behavior: skills/core/spider-cli-extraction.md
Load references/cli-workflows.md when you need exact command patterns or mode-selection rules.
Workflow
- Confirm CLI availability.
- Prefer
cargo run -p spider_cli -- ...from the Spider repo root. - If
spideris globally installed, usespider ...for quick checks.
- Choose the task mode.
- Use
crawlto collect links. - Use
scrapeto emit per-page JSON records and optionally include HTML. - Use
downloadto persist page markup to disk.
- Select runtime execution mode.
- Use
--headlessfor browser-rendered mode. - Use
--httpto force HTTP-only mode. - Omit both for default HTTP behavior.
- Add scope controls.
- Set
--limit,--depth,--budget, and--blacklist-url. - Add
--respect-robots-txtwhen policy compliance is required.
Quick Commands
# Crawl links (default HTTP mode)
cargo run -p spider_cli -- --url https://example.com crawl --output-links
# Browser mode on demand
cargo run -p spider_cli -- --url https://example.com --headless crawl --output-links
# Scrape with HTML output
cargo run -p spider_cli -- --url https://example.com scrape --output-html
Script
Use scripts/spider_cli_helper.sh for wrappers:
./scripts/spider_cli_helper.sh verify-headless
./scripts/spider_cli_helper.sh crawl https://example.com --limit 20 --depth 2
./scripts/spider_cli_helper.sh scrape https://example.com --output-html --output-links
Weekly Installs
4
Repository
spider-rs/spider_skillsFirst Seen
Feb 20, 2026
Security Audits
Installed on
opencode4
gemini-cli3
claude-code3
github-copilot3
codex3
amp3