content-parser
When to Use
- User provides a URL and wants to extract/read its content
- Another skill needs to parse source material from a URL before generation
- User says "parse this URL", "extract content from this link"
- User says "解析链接", "提取内容"
When NOT to Use
- User already has text content and doesn't need URL parsing
- User wants to generate audio/video content (not content extraction)
- User wants to read a local file (use standard file reading tools)
Purpose
Extract and normalize content from URLs across supported platforms. Returns structured data including content body, metadata, and references. Useful as a preprocessing step for content generation skills or standalone content extraction.
Hard Constraints
- No shell scripts. Construct curl commands from the API Reference (Inlined) section below
- See § API Reference (Inlined) below for API key and headers
- See § API Reference (Inlined) below for polling, errors, and interaction patterns
- URL must be a valid HTTP(S) URL
- Always read config following
shared/config-pattern.mdbefore any interaction - Never save files to
~/Downloads/or.listenhub/— save to the current working directory
Step -1: API Key Check
Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.
Step 0: Config Setup
Follow shared/config-pattern.md Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow.
If file exists — read config silently and proceed:
CONFIG_PATH=".listenhub/content-parser/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/content-parser/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Setup Flow (user-initiated reconfigure only)
Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (content-parser):
自动下载:{是 / 否}
Then ask:
- autoDownload: "自动保存提取的内容到当前目录?"
- "是(推荐)" →
autoDownload: true - "否" →
autoDownload: false
- "是(推荐)" →
Save immediately:
NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Interaction Flow
Step 1: URL Input
Free text input. Ask the user:
What URL would you like to extract content from?
Step 2: Options (optional)
Ask if the user wants to configure extraction options:
Question: "Do you want to configure extraction options?"
Options:
- "No, use defaults" — Extract with default settings
- "Yes, configure options" — Set summarize, maxLength, or Twitter tweet count
If "Yes", ask follow-up questions:
- Summarize: "Generate a summary of the content?" (Yes/No)
- Max Length: "Set maximum content length?" (Free text, e.g., "5000")
- Twitter count (only if URL is Twitter/X profile): "How many tweets to fetch?" (1-100, default 20)
Step 3: Confirm & Extract
Summarize:
Ready to extract content:
URL: {url}
Options: {summarize: true, maxLength: 5000, twitter.count: 50} / default
Proceed?
Wait for explicit confirmation before calling the API.
Workflow
-
Validate URL: Must be HTTP(S). Normalize if needed (see
references/supported-platforms.md) -
Build request body:
{ "source": { "type": "url", "uri": "{url}" }, "options": { "summarize": true/false, "maxLength": 5000, "twitter": { "count": 50 } } }Omit
optionsif user chose defaults. -
Submit (foreground):
POST /v1/content/extract→ extracttaskId -
Tell the user extraction is in progress
-
Poll (background): Run the following exact bash command with
run_in_background: trueandtimeout: 300000. Note: status field is.data.status(notprocessStatus), interval is 5s, values areprocessing/completed/failed:TASK_ID="<id-from-step-3>" for i in $(seq 1 60); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" \ -H "X-Source: skills" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"') case "$STATUS" in completed) echo "$RESULT"; exit 0 ;; failed) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 5 ;; esac done echo "TIMEOUT" >&2; exit 2 -
When notified, download and present result:
If
autoDownloadistrue, generate a slug from the extracted title (falling back to domain name if no title). Followshared/config-pattern.md§ Artifact Naming for slug generation and dedup.- Write
{slug}.mdto the current directory — full extracted content in markdown - Write
{slug}.jsonto the current directory — full raw API response data
SLUG="{title-slug}" # e.g. "topology-wikipedia" # Dedup: check if files exist BASE="$SLUG"; i=2 while [ -e "${SLUG}.md" ] || [ -e "${SLUG}.json" ]; do SLUG="${BASE}-${i}"; i=$((i+1)); done echo "$CONTENT_MD" > "${SLUG}.md" echo "$RESULT" > "${SLUG}.json"Present:
内容提取完成! 来源:{url} 标题:{metadata.title} 长度:~{character count} 字符 消耗积分:{credits} 已保存到当前目录: {slug}.md {slug}.json - Write
-
Show a preview of the extracted content (first ~500 chars)
-
Offer to use content in another skill (e.g.
/podcast,/tts)
Estimated time: 10-30 seconds depending on content size and platform.
API Reference (Inlined)
Authentication
Environment variable: LISTENHUB_API_KEY (format: lh_sk_...)
Store in ~/.zshrc (macOS) or ~/.bashrc (Linux):
export LISTENHUB_API_KEY="lh_sk_..."
How to obtain: Visit https://listenhub.ai/settings/api-keys (Pro plan required).
Base URL: https://api.marswave.ai/openapi/v1
Required headers (every request):
Authorization: Bearer $LISTENHUB_API_KEY
Content-Type: application/json
X-Source: skills
The X-Source: skills header identifies requests as coming from Claude Code skills (CLI tool).
curl template:
curl -sS -X POST "https://api.marswave.ai/openapi/v1/{endpoint}" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{ ... }'
For GET requests, omit -d and change -X POST to -X GET.
Security notes:
- Never log or display full API keys in output
- API keys are transmitted via HTTPS only
- Do not pass sensitive or confidential information as content input — it is sent to external APIs for processing
POST /v1/content/extract
Create a content extraction task for a URL. Returns a taskId for polling.
Request body:
| Field | Required | Type | Description |
|---|---|---|---|
| source | Yes | object | Source to extract from |
| source.type | Yes | string | Must be "url" |
| source.uri | Yes | string | Valid HTTP(S) URL to extract content from |
| options | No | object | Extraction options |
| options.summarize | No | boolean | Whether to generate a summary |
| options.maxLength | No | integer | Maximum content length |
| options.twitter | No | object | Twitter/X specific options |
| options.twitter.count | No | integer | Number of tweets to fetch (1-100, default 20) |
Response:
{
"code": 0,
"message": "success",
"data": {
"taskId": "69a7dac700cf95938f86d9bb"
}
}
Error codes:
| Code | Meaning |
|---|---|
| 29003 | Validation error ("source.uri" is required, "source.uri" must be a valid uri) |
| 21007 | Invalid API key |
GET /v1/content/extract/{taskId}
Get extraction task status and results.
Path params:
| Param | Type | Description |
|---|---|---|
| taskId | string | 24-char hex task ID |
Response states:
- processing — Task is still running
- completed — Extraction finished, data available
- failed — Extraction failed, check
failCodeandmessage
Response (processing):
{
"code": 0,
"message": "success",
"data": {
"taskId": "69a7dac700cf95938f86d9bb",
"status": "processing",
"createdAt": "2025-04-09T12:00:00Z",
"data": null,
"credits": 0,
"failCode": null,
"message": null
}
}
Response (completed):
{
"code": 0,
"message": "success",
"data": {
"taskId": "69a7dac700cf95938f86d9bb",
"status": "completed",
"createdAt": "2025-04-09T12:00:00Z",
"data": {
"content": "Extracted text content...",
"metadata": {
"title": "Article Title",
"author": "Author Name",
"publishedAt": "2025-04-01T08:00:00Z"
},
"references": [
"https://example.com/related-article"
]
},
"credits": 5,
"failCode": null,
"message": null
}
}
Response (failed):
{
"code": 0,
"message": "success",
"data": {
"taskId": "69a7dac700cf95938f86d9bb",
"status": "failed",
"createdAt": "2025-04-09T12:00:00Z",
"data": null,
"credits": 0,
"failCode": "EXTRACT_FAILED",
"message": "Unable to extract content from the provided URL"
}
}
Key fields:
| Field | Type | Description |
|---|---|---|
| status | string | processing, completed, or failed |
| data.data.content | string | Extracted text content |
| data.data.metadata | object | Page metadata (title, author, publishedAt) |
| data.data.references | array | Referenced URLs (array of strings) |
| credits | integer | Credits consumed |
| failCode | string | Error code (null on success) |
| message | string | Error message (null on success) |
Error codes:
| Code | Meaning |
|---|---|
| 29003 | Invalid taskId format |
| 25002 | Task not found |
Polling Pattern
5-second interval, 60 polls max. Run with run_in_background: true and timeout: 300000.
Two-step pattern:
- Submit (foreground): POST the creation request, extract
taskIdfrom the response. - Poll (background): Run the polling loop with
run_in_background: true. You will be notified automatically when it completes.
The exact polling bash command is already specified in the Workflow section (Step 5).
Error Handling
HTTP status codes:
| Code | Meaning | Action |
|---|---|---|
| 200 | Success | Parse response body |
| 400 | Bad request | Check parameters |
| 401 | Invalid API key | Re-check LISTENHUB_API_KEY |
| 402 | Insufficient credits | Inform user to recharge |
| 403 | Forbidden | No permission for this resource |
| 429 | Rate limited | Exponential backoff, retry after delay |
| 500/502/503/504 | Server error | Retry up to 3 times |
Retry strategy:
- 429 rate limit: Wait 15 seconds, then retry (exponential backoff)
- 5xx server errors: Retry up to 3 times with 5-second intervals
- Network errors: Retry up to 3 times
Application error codes:
| Code | Meaning |
|---|---|
| 21007 | Invalid user API key |
| 25429 | Rate limited (application-level) |
Example
User: "Parse this article: https://en.wikipedia.org/wiki/Topology"
Agent workflow:
- URL:
https://en.wikipedia.org/wiki/Topology - Options: defaults (omit options)
- Submit extraction
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{
"source": {
"type": "url",
"uri": "https://en.wikipedia.org/wiki/Topology"
}
}'
- Poll until complete:
curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "X-Source: skills"
- Present extracted content preview and offer next actions.
User: "Extract recent tweets from @elonmusk, get 50 tweets"
Agent workflow:
- URL:
https://x.com/elonmusk - Options:
{"twitter": {"count": 50}} - Submit extraction
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{
"source": {
"type": "url",
"uri": "https://x.com/elonmusk"
},
"options": {
"twitter": {
"count": 50
}
}
}'
- Poll until complete, present results.