extract
Extract Skill
Extract clean content from specific URLs. Ideal when you know which pages you want content from.
Authentication
The script uses OAuth via the Tavily MCP server. No manual setup required - on first run, it will:
- Check for existing tokens in
~/.mcp-auth/ - If none found, automatically open your browser for OAuth authentication
Note: You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. Sign up at tavily.com first if you don't have an account.
Alternative: API Key
If you prefer using an API key, get one at https://tavily.com and add to ~/.claude/settings.json:
{
"env": {
"TAVILY_API_KEY": "tvly-your-api-key-here"
}
}
Quick Start
Using the Script
./scripts/extract.sh '<json>'
python .\scripts\extract.py '<json>'
pwsh .\scripts\extract.ps1 '<json>'
Examples:
# Single URL
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'
# Multiple URLs
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'
# With query focus and chunks
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'
# Advanced extraction for JS pages
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
Basic Extraction
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://example.com/article"]
}'
Multiple URLs with Query Focus
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/ml-healthcare",
"https://example.com/ai-diagnostics"
],
"query": "AI diagnostic tools accuracy",
"chunks_per_source": 3
}'
API Reference
Endpoint
POST https://api.tavily.com/extract
Headers
| Header | Value |
|---|---|
Authorization |
Bearer <TAVILY_API_KEY> |
Content-Type |
application/json |
Request Body
| Field | Type | Default | Description |
|---|---|---|---|
urls |
array | Required | URLs to extract (max 20) |
query |
string | null | Reranks chunks by relevance |
chunks_per_source |
integer | 3 | Chunks per URL (1-5, requires query) |
extract_depth |
string | "basic" |
basic or advanced (for JS pages) |
format |
string | "markdown" |
markdown or text |
include_images |
boolean | false | Include image URLs |
timeout |
float | varies | Max wait (1-60 seconds) |
Response Format
{
"results": [
{
"url": "https://example.com/article",
"raw_content": "# Article Title\n\nContent..."
}
],
"failed_results": [],
"response_time": 2.3
}
Extract Depth
| Depth | When to Use |
|---|---|
basic |
Simple text extraction, faster |
advanced |
Dynamic/JS-rendered pages, tables, structured data |
Examples
Single URL Extraction
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://docs.python.org/3/tutorial/classes.html"],
"extract_depth": "basic"
}'
Targeted Extraction with Query
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/react-hooks",
"https://example.com/react-state"
],
"query": "useState and useEffect patterns",
"chunks_per_source": 2
}'
JavaScript-Heavy Pages
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://app.example.com/dashboard"],
"extract_depth": "advanced",
"timeout": 60
}'
Batch Extraction
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
"https://example.com/page4",
"https://example.com/page5"
],
"extract_depth": "basic"
}'
Tips
- Max 20 URLs per request - batch larger lists
- Use
query+chunks_per_sourceto get only relevant content - Try
basicfirst, fall back toadvancedif content is missing - Set longer
timeoutfor slow pages (up to 60s) - Check
failed_resultsfor URLs that couldn't be extracted
More from l-yifan/skills
scientific-figure-pro
Generate publication-ready scientific figures in Python/matplotlib with a consistent figures4papers house style. Use when creating or refining academic bar/trend/heatmap/scatter/multi-panel figures, enforcing visual consistency, or exporting paper-ready PNG/PDF/SVG outputs.
33figures4papers-playbook
Locate and adapt real plotting examples from the figures4papers repository. Use when users ask for a figure in the style of specific papers/projects, want the closest existing script template, or need fast script selection by chart type/domain before customization.
31deep-wiki
Access AI-generated documentation and insights for GitHub repositories via DeepWiki. This skill should be used when exploring unfamiliar codebases, understanding repository architecture, finding implementation patterns, or asking questions about how a GitHub project works. Supports any public GitHub repository.
25gkg
Global Knowledge Graph for codebase analysis. This skill should be used when searching for code definitions (functions, classes, methods), finding references to symbols, understanding code structure, analyzing import usage, generating repository maps, or performing impact analysis before refactoring. Supports TypeScript, JavaScript, Python, Java, and more.
21gh-grep
Search real-world code examples across millions of GitHub repositories using grep.app. This skill should be used when looking for implementation patterns, API usage examples, library integration patterns, or production code references. Supports literal code search, regex patterns, and filtering by language/repo/path.
18github
Interact with GitHub repositories, issues, pull requests, and code via the GitHub MCP server. This skill should be used when managing repositories, creating/updating files, working with issues and PRs, searching code/repos/users, creating branches, and performing code reviews. Supports all major GitHub API operations.
16