Web Content Extractor

This skill helps you extract the clean main content (body text/Markdown) from a webpage URL by using Defuddle or Jina AI's reader API.

How it works

To extract the content of a target URL, you will prepend a specific service URL to the target URL and fetch it. This converts the messy webpage into clean Markdown containing only the main content.

Available Services

Defuddle (Default)
- Format: https://defuddle.md/<target-url>
- Example: https://defuddle.md/https://example.com/article
- Use this as the primary method.
Jina AI Reader (Fallback)
- Format: https://r.jina.ai/<target-url>
- Example: https://r.jina.ai/https://example.com/article
- Use this if Defuddle fails or returns an error.

Execution Steps

Identify the target URL: Extract the full URL the user wants to read from their request. Ensure it includes the protocol (e.g., https://).
Construct the fetch URL: Prepend https://defuddle.md/ to the target URL.
Fetch the content: Use the shell_execute tool with curl -sL "FETCH_URL" to download the content.
- Example command: curl -sL "https://defuddle.md/https://example.com/article"
Handle Fallbacks: If the curl command fails, returns empty, or returns an error message indicating failure, try the Jina AI service instead: curl -sL "https://r.jina.ai/https://example.com/article"
Process the output: The output will be in Markdown format.
- If the user asked you to read it to answer a question, use the content to answer.
- If the user asked you to extract or save it, present the Markdown to them or save it to a file as requested.

Notes

Always enclose the URL in quotes in the curl command to prevent shell interpretation of special characters like & or ?.
If the target URL is missing http:// or https://, prepend https:// before appending it to the service URL.

web-content-extractor

Web Content Extractor

How it works

Available Services

Execution Steps

Notes

More from openminis/minisskills

douyin-downloader

web-search

twitter-x-hub

doubao-tts

exa-search

bilibili-hub