skills/jrajasekera/claude-skills/article-extractor

article-extractor

SKILL.md

Article Extractor

Extract clean article content from URLs, removing ads, navigation, and clutter. Multi-tool fallback ensures reliability.

Workflow

When user provides a URL to download/extract:

  1. Call the extraction script directly with the URL (do NOT fetch the URL first with web_fetch)
  2. Script handles fetching, extraction, and saving automatically
  3. Returns clean markdown file with frontmatter

Usage

# Basic extraction
scripts/extract-article.sh "https://example.com/article"

# Specify output location
scripts/extract-article.sh "https://example.com/article" -o my-article.md -d ~/Documents

# Try Wayback Machine if original fails
scripts/extract-article.sh "https://example.com/article" --wayback

Make script executable if needed: chmod +x scripts/extract-article.sh

Key Options

  • -o <file> - Output filename
  • -d <dir> - Output directory
  • -w, --wayback - Try Wayback Machine if extraction fails
  • -t <tool> - Force tool: jina, trafilatura, readability, fallback
  • -q - Quiet mode

For complete options, exit codes, tool details, and examples, see references/tools-and-options.md.

Common Failures

  • Exit 3 (access denied): Paywall or login required - try --wayback
  • Exit 4 (no content): Heavy JavaScript - try different --tool
  • Exit 2 (network): Connection issue - check URL

Local Tools (Optional)

For offline extraction: scripts/install-deps.sh

Weekly Installs
6
GitHub Stars
1
First Seen
Feb 4, 2026
Installed on
opencode5
openclaw5
cursor5
claude-code4
gemini-cli3
github-copilot3