fetch-url-as-markdown
URL to Markdown
Fetch any web URL and get clean, readable Markdown — main content only, no navigation/footer/ads. Local + free by default; smart fallback to Exa MCP when the page can't be extracted locally.
Workflow (the only thing the agent needs to remember)
-
Try trafilatura first:
python3 ~/.claude/skills/fetch-url-as-markdown/scripts/fetch_url.py "<URL>" -
If exit code is 1 or 2 → fall back to Exa MCP with the same URL:
mcp__exa__web_search_advanced_exa( query="<URL>", includeDomains=["<host of URL>"], numResults=1, textMaxCharacters=50000, type="auto" )(
mcp__exa__crawlingworks too if the server exposes it; theweb_search_advanced_exacall above is the always-available variant — pin the host withincludeDomainsand use the URL itself as the query.) -
Exit code
3means trafilatura is not installed — install once:python3 -m pip install --break-system-packages trafilatura
Exit codes (what they mean for the fallback decision)
| Code | Meaning | Action |
|---|---|---|
| 0 | Markdown printed to stdout | done |
| 1 | DownloadError — network/HTTP/timeout/anti-bot block at fetch | fall back to Exa |
| 2 | ExtractionError — empty extract, JS/Cloudflare wall, or stub body (<200 chars) | fall back to Exa |
| 3 | trafilatura missing | install (see above), then retry |
| 4 | UnsupportedContentTypeError — URL is binary (PDF, image, archive) | don't fall back to Exa; use the right specialized skill (e.g. pdf for PDFs) |
Defaults baked into the script
output_format="markdown",include_formatting=True— keeps headings/lists/code structure where the source HTML uses real<h1..h6>etc.include_links=True,include_tables=Truewith_metadata=True→ emits a YAML frontmatter (title,author,date,url,hostname)favor_recall=True,deduplicate=True— readable but trims duplicates- Real-browser User-Agent + 30s timeout configured in
scripts/settings.cfg - Anti-stub guards (built into the script):
- rejects
Content-Typeother thantext/html|application/xhtml+xml|text/plain|application/xml|text/xml→ exit4 - sniffs raw HTML for Cloudflare / "Please enable JavaScript" / Imperva / DataDome wall markers → exit
2 - rejects extracted bodies under 50 chars (configurable via
--min-body N,0to disable) → exit2
- rejects
Useful flags
... fetch_url.py "<URL>" --no-links # strip hyperlinks
... fetch_url.py "<URL>" --no-tables # strip tables
... fetch_url.py "<URL>" --no-metadata # omit YAML header
... fetch_url.py "<URL>" --comments # include user comments (off by default — usually noise)
... fetch_url.py "<URL>" --images # include image refs (experimental)
... fetch_url.py "<URL>" --precision # terser output, drops borderline content
When to choose what
| Situation | Tool |
|---|---|
| Article, blog post, docs, README, wiki | trafilatura (default) — local, free |
| JS-heavy SPA, login-walled, Cloudflare | Exa fallback (the script will signal exit 2) |
| Bulk / many URLs | trafilatura — no quota, no API key |
| Already failed twice on a domain | Exa directly |
More from codealive-ai/ai-driven-development
fpf-problem-solving
First Principles Framework (FPF) — thinking amplifier. Use when user wants to think through a complex problem, architect a system, evaluate alternatives, decompose complexity, classify problems, define quality attributes, or plan rigorously. Also triggers on: FPF, bounded contexts, SoTA packs, assurance calculus, FPF Parts A-K. Not for simple task planning, general philosophy, or Agile unrelated to FPF.
5settings-management
View and configure settings for coding agents (Claude Code, Codex CLI, OpenCode, and others). Covers JSON settings for Claude Code, TOML for Codex CLI, and JSON/JSONC for OpenCode, including permissions, sandbox, model selection, profiles, feature flags, providers, hooks, subagents, and skills.
3optimizing-claude-code
Audits repositories for Claude Code readiness and suggests improvements. Use when asked to check CLAUDE.md quality, review settings, audit project organization, or optimize for agentic work.
2agents-consilium
Query external AI agents (Codex, Gemini, OpenCode, Claude Code headless) in parallel for independent second opinions, code review, bug investigation, and consensus on high-stakes decisions. Agents and models are configurable in config.json. Use for architecture choices, security review, or ambiguous problems where independent perspectives matter. Not for simple questions answerable from docs or the codebase — use web search or repo exploration instead.
1clipboard
Copy text to clipboard with optional rich formatting. Triggers on "copy to clipboard", "copy that", "pbcopy", "copy formatted", "copy rich text".
1hooks-management
Manage hooks and automation for coding agents (Claude Code, Codex CLI, OpenCode). Use when users want to add, list, remove, update, or validate hooks. Triggers on requests like "add a hook", "create a hook that...", "list my hooks", "remove the hook", "validate hooks", or any mention of automating agent behavior with shell commands or plugins.
1