Article Extractor

Extract clean article content from URLs, removing ads, navigation, and clutter. Multi-tool fallback ensures reliability.

Workflow

When user provides a URL to download/extract:

Call the extraction script directly with the URL (do NOT fetch the URL first with web_fetch)
Script handles fetching, extraction, and saving automatically
Returns clean markdown file with frontmatter

Usage

# Basic extraction
scripts/extract-article.sh "https://example.com/article"

# Specify output location
scripts/extract-article.sh "https://example.com/article" -o my-article.md -d ~/Documents

# Try Wayback Machine if original fails
scripts/extract-article.sh "https://example.com/article" --wayback

Make script executable if needed: chmod +x scripts/extract-article.sh

Key Options

-o <file> - Output filename
-d <dir> - Output directory
-w, --wayback - Try Wayback Machine if extraction fails
-t <tool> - Force tool: jina, trafilatura, readability, fallback
-q - Quiet mode

For complete options, exit codes, tool details, and examples, see references/tools-and-options.md.

Common Failures

Exit 3 (access denied): Paywall or login required - try --wayback
Exit 4 (no content): Heavy JavaScript - try different --tool
Exit 2 (network): Connection issue - check URL

Local Tools (Optional)

For offline extraction: scripts/install-deps.sh

article-extractor

Article Extractor

Workflow

Usage

Key Options

Common Failures

Local Tools (Optional)

More from jrajasekera/claude-skills

pandoc-converter

openrouter-api

sqlite-optimization

venice-ai-api

z-ai-api

codex-review