Web Extractor

Extract complete text content from web pages, even when content is dynamically loaded by JavaScript, behind authentication, or uses virtual scrolling.

When This Skill Is Needed

Many modern web pages don't serve their content as static HTML. Instead, content is loaded by JavaScript after the page renders, making simple HTTP fetches return empty or partial results. Common scenarios:

Authentication-protected pages: Sites requiring login (Google Docs, Notion, etc.)
JS-rendered SPAs: React/Vue/Angular apps where content lives in JavaScript state
Virtual scrolling: Long documents that only render visible content in the DOM (the content that scrolled past is removed, and content below isn't yet created)
Lazy-loaded content: Sections that load as you scroll down

The key insight: even though JS loads content dynamically, once it renders, the content enters the DOM and becomes readable via querySelector / innerText.

web-extractor

Web Extractor

When This Skill Is Needed

More from touricks/fanshi_personal_skills

study-notes-generator

scientific-slides

langgraph

humanizer-zh

docx

ml-paper-writing