defuddle
Defuddle Web Content Extractor
Use Defuddle to turn cluttered web pages into clean, readable Markdown before analysis or saving.
This is the default choice for ordinary public web pages because it usually preserves the main content while stripping navigation, ads, sidebars, and other token-wasting noise.
When to use this skill
Typical triggers:
- "read this URL"
- "save this article as markdown"
- "extract the text from this page"
- "scrape this documentation page"
- "clean up this web content before summarizing it"
When not to rely on Defuddle alone
Defuddle is not always the right tool for:
- login-protected pages
- sites that require heavy JavaScript interaction to reveal content
- workflows that need raw HTML structure rather than cleaned text
If extraction looks incomplete, switch to a browser-capable or raw-HTML approach.
Quick start
Check availability and install if needed:
which defuddle || npm install -g defuddle
Extract readable Markdown:
defuddle parse <url> --md
Recommended workflow
- Extract first using
--md - Review the output for missing sections or weird truncation
- Save to a file if the page is long or will be reused
- Only fall back to raw HTML or another tool if the cleaned output is obviously incomplete
Common command patterns
Read a page immediately
defuddle parse https://example.com/article --md
Use this when the next step is summarizing, analyzing, or quoting the content.
Save a cleaned copy
defuddle parse https://docs.example.com/page --md -o documentation.md
defuddle parse https://blog.example.com/post --md -o notes/post-title.md
Use this when the page is long, should be archived, or should be processed later.
Extract metadata only
defuddle parse <url> -p title
defuddle parse <url> -p description
defuddle parse <url> -p author
defuddle parse <url> -p publishDate
defuddle parse <url> -p domain
Use this when building link catalogs or collecting source metadata without the full body.
Use JSON output when needed
defuddle parse <url> --json
Use JSON only when you specifically need both HTML and Markdown output. For ordinary reading and summarization, --md is the better default.
Quality checks
After extraction, verify:
- the main heading and article body are present
- obvious navigation/sidebar clutter is gone
- the content is complete enough for the user's task
If the page is missing major sections, the site may be rendering content client-side.
Why this skill matters
A raw documentation or news page can include huge amounts of HTML that add tokens but not meaning. Defuddle usually reduces that noise dramatically while keeping the information humans actually care about. That makes downstream summarization, note-taking, or conversion work cheaper and more reliable.
Output expectations
When using this skill, make it clear:
- which URL you extracted
- whether you returned inline Markdown or saved a file
- whether the extraction looked complete
- whether a fallback tool may be needed
More from zpankz/obsidian-skills
viva-llm
Use VIVA LLM for multi-provider chat, voice calls, terminal integration, assistants, skills, MCP tools, and agent mode inside Obsidian. Trigger when the user mentions VIVA LLM, voice chat, realtime voice, LLM providers in Obsidian, or vault-integrated AI chat.
1obsidian-plugin-accessibility
Use this skill when building or reviewing Obsidian plugin UI for keyboard access, ARIA labels, screen reader support, focus handling, or mobile touch targets. Accessibility is treated as mandatory, not optional.
1tasks
Create and query tasks using the Tasks plugin syntax including due dates, recurrence, priorities, and task queries. Use when the user mentions Tasks plugin, recurring tasks, task queries, or advanced task management in Obsidian.
1dataview
Create Dataview queries using DQL (Dataview Query Language), inline queries, and DataviewJS. Use when the user mentions Dataview, DQL, querying notes, listing notes by metadata, or creating dynamic views of vault content.
1datacore
Create Datacore views using JSX/React syntax and the dc.* API. Use when the user mentions Datacore, dc.useQuery, JSX views, or React-based vault queries. Datacore is the successor to Dataview with better performance and interactive views.
1obc
Run OBC vault commands for AI-augmented PKM: daily planning, vault analysis, idea connections, and structured thinking workflows. Trigger on /today, /connect, /map, or any vault interaction command.
1