link-scraper
Link Scraper
Fetch and extract content from URLs with automatic summarization. This skill enables the agent to gather information from the web by scraping web pages, extracting main content, and providing concise summaries.
When to Use
- User shares a URL and asks "what's this about?"
- Researching a topic that requires reading online articles
- Extracting documentation or technical content from websites
- Getting summaries of blog posts, news articles, or papers
- Extracting code snippets or examples from web sources
- Fetching content that the user wants analyzed or discussed
Setup
No additional installation required. Uses built-in Node.js modules andcheerio for HTML parsing.
If cheerio is not available, falls back to basic regex-based extraction.
Usage
Extract a single URL
node /job/.pi/skills/link-scraper/scrape.js "https://example.com/article"
Extract multiple URLs
node /job/.pi/skills/link-scraper/scrape.js "https://example.com/page1" "https://example.com/page2"
Get just the title
node /job/.pi/skills/link-scraper/scrape.js --title "https://example.com"
Get full content (no summary)
node /job/.pi/skills/link-scraper/scrape.js --full "https://example.com"
Extract specific elements (CSS selector)
node /job/.pi/skills/link-scraper/scrape.js --selector "article" "https://example.com/blog"
Output Format
The scraper returns JSON with the following structure:
{
"url": "https://example.com/article",
"title": "Article Title",
"description": "Brief description of the page...",
"content": "Main content extracted from the page...",
"wordCount": 500,
"links": ["https://example.com/related1", "https://example.com/related2"],
"images": ["https://example.com/image1.jpg"],
"siteName": "Example Site"
}
When summarized:
{
"url": "https://example.com/article",
"title": "Article Title",
"summary": "A concise 2-3 sentence summary of the article...",
"keyPoints": [
"First key point from the article",
"Second key point",
"Third key point"
],
"wordCount": 500,
"readTime": "2 min"
}
Common Workflows
Quick URL Summary
User: Check out https://github.com/openclaw/openclaw for me
Agent: [Uses link-scraper to fetch and summarize]
Research Task
User: Find information about AI agents
Agent: [Uses link-scraper to fetch relevant articles, documentation, etc.]
Code Example Extraction
User: How do I use the GitHub API? https://docs.github.com/en/rest
Agent: [Uses link-scraper with --selector to extract code examples]
Integration with Other Skills
- With memory-agent: Store researched information for future reference
- With browser-tools: Use for JavaScript-rendered pages that need a browser
- With voice-output: Announce summaries aloud
Limitations
- Cannot fetch password-protected pages
- Some sites block scrapers (may need browser-tools as fallback)
- Large pages may be truncated for token limits
- JavaScript-rendered content may not be available (use browser-tools)
Tips
- For articles: The scraper automatically extracts main article content
- For documentation: Use --selector "pre code" to get code blocks
- For lists: Use --selector "ul li" to extract list items
- For speed: Add --no-summary for quick title/description only
More from winsorllc/upgraded-carnival
vector-memory
Vector-based semantic memory using embeddings for intelligent recall. Store and search memories by meaning rather than keywords. Use when you need semantic search, similar document retrieval, or context-aware memory.
132model-router
Route requests between different LLM providers and models. Configure routing rules, fallback providers, and model-specific parameters inspired by ZeroClaw and OpenClaw model routing systems.
63rss-monitor
Monitor RSS/Atom feeds and blogs for new content using feedparser.
60rss-reader
Read and parse RSS/Atom feeds. Use when: user wants to subscribe to feeds, get latest articles, or monitor news sources.
55video-frames
Production-grade video frame extraction with thumbnail grids, GIF creation, and batch frame processing. Includes intelligent quality presets, progress tracking, and comprehensive error handling.
39elevenlabs-tts
Convert text to speech using ElevenLabs API. Use when you need to generate voice audio for messages, narrations, or accessibility.
25