defuddle
Defuddle - Web Content Extraction
Extract main article content from web pages, removing ads, sidebars, navigation, and other clutter. Output clean Markdown with metadata.
Prerequisites
Before first use, check if defuddle is installed:
command -v defuddle >/dev/null 2>&1 || npm install -g defuddle jsdom
Default Workflow
When user provides a URL, follow this workflow:
Step 1: Extract content as Markdown + JSON metadata
Always use both -m and -j flags to get markdown content with full metadata:
defuddle parse "<url>" -m -j
Step 2: Present a summary to the user
Show the user:
- Title: from JSON
titlefield - Author: from JSON
authorfield - Source: domain
- Word count: from JSON
wordCountfield - A brief preview (first 2-3 sentences)
Step 3: Ask where to save
If this is the first time using defuddle in this conversation, ask the user:
"Save to which directory? (e.g.
~/Documents,~/Desktop, or a custom path)"
Remember the user's chosen directory for subsequent uses in the same conversation.
Step 4: Save as Markdown file
Write the file with frontmatter + full content:
---
title: {title}
author: {author}
source: {url}
date: {published or "Unknown"}
clipped: {today's date YYYY-MM-DD}
wordCount: {wordCount}
---
# {title}
{markdown content}
File naming: Use the article title as filename, sanitized for filesystem:
- Replace special characters with spaces
- Trim whitespace
- Example:
The Shape of the Essay Field.md
Step 5: Confirm to user
Tell the user the file path where it was saved.
CLI Reference
defuddle parse <source> [options]
Arguments:
<source>— URL (https://...) or local HTML file path
Options:
| Flag | Description |
|---|---|
-m, --markdown |
Convert content to Markdown |
-j, --json |
Output as JSON with full metadata |
-o, --output <file> |
Write to file instead of stdout |
-p, --property <name> |
Extract single property (title, description, domain, author, published, wordCount, content) |
--debug |
Verbose logging |
JSON Response Fields
When using -j, the response includes:
title— Article titleauthor— Author namepublished— Publication datedescription— Meta descriptioncontent— Extracted Markdown (when-mused)domain— Source domainfavicon— Favicon URLimage— Featured image URLsite— Site namewordCount— Word countparseTime— Processing time in ms
Notes
- Requires Node.js and npm
jsdomis required as a peer dependency- Works best with article-style pages (blogs, news, documentation)
- Not designed for SPAs or JavaScript-heavy pages (e.g. WeChat articles need browser rendering)