Web Fetch Skill

Fetch and parse web content from URLs.

When to Use

✅ USE this skill when:

"Fetch content from URL"
"Download file from..."
"Extract article text from..."
"Get page title and description"
"Scrape data from webpage"

When NOT to Use

❌ DON'T use this skill when:

Interactive browser actions → use browser-tools
Authenticated sessions → use browser-tools with profile
JavaScript-heavy SPAs → use browser-tools

Commands

Fetch Content

{baseDir}/fetch.sh "https://example.com"
{baseDir}/fetch.sh "https://example.com" --markdown
{baseDir}/fetch.sh "https://example.com" --json

Extract Article

{baseDir}/extract.sh "https://example.com/article"
{baseDir}/extract.sh "https://example.com/article" --format markdown

Download File

{baseDir}/download.sh "https://example.com/file.pdf" --out /tmp/file.pdf
{baseDir}/download.sh "https://example.com/archive.zip" --out /tmp/archive.zip

Get Page Metadata

{baseDir}/metadata.sh "https://example.com"
{baseDir}/metadata.sh "https://example.com" --json

Extract Links

{baseDir}/links.sh "https://example.com"
{baseDir}/links.sh "https://example.com" --filter "blog"

Extract Images

{baseDir}/images.sh "https://example.com"
{baseDir}/images.sh "https://example.com" --download --out /tmp/images/

Options

--markdown: Output as markdown
--json: Output as JSON
--text: Plain text output
--timeout N: Timeout in seconds (default: 30)
--user-agent: Custom user agent
--out <path>: Output file path

Output Formats

Plain Text

Extract visible text from HTML, cleaned of scripts and styles.

Markdown

Convert HTML to markdown with proper formatting.

JSON

Structured output with title, content, metadata.

Examples

Get article content:

{baseDir}/extract.sh "https://example.com/blog/post" --markdown

Download all PDFs from page:

{baseDir}/links.sh "https://example.com" --filter ".pdf" | xargs -I {} download.sh "{}"

Get page metadata:

{baseDir}/metadata.sh "https://example.com" --json
# Output: {"title": "...", "description": "...", "og:image": "..."}

Notes

Respects robots.txt by default
Rate limiting: 1 request per second by default
Use --user-agent to set custom user agent
For JavaScript-heavy pages, use browser-tools instead

web-fetch