xhs

SKILL.md

ๅฐ็บขไนฆ Research ๐Ÿ“•

Research tool for Chinese user-generated content โ€” travel, food, lifestyle, local discoveries.

When to Use

  • Travel planning and itineraries
  • Restaurant/cafe/bar recommendations
  • Activity and weekend planning
  • Product reviews and comparisons
  • Local discovery and hidden gems
  • Any question where Chinese perspectives help

Recommended Model

When spawning as a sub-agent: Sonnet 4.5 (model: "claude-sonnet-4-5-20250929")

  • Fast enough for the slow XHS API calls
  • Good at Chinese content understanding
  • More cost-effective than Opus for research grunt work
  • Opus overkill for search โ†’ synthesize workflow

Context Management (Always Use)

ALWAYS use dynamic context monitoring โ€” even 5 posts with images can hit 75-300k tokens.

The Problem

  • Each post with images = 15-60k tokens
  • 200k context fills fast
  • Context is append-only (can't "forget" within session)

The Solution: Monitor + Checkpoint + Continue

1. After EACH post, do two things:

a) Write findings to disk immediately:
   /research/{task-id}/findings/post-{n}.md

b) Check context usage:
   session_status โ†’ look for "Context: XXXk/200k (YY%)"

2. When context hits 70%, STOP and checkpoint:

Write state file:
/research/{task-id}/state.json
{
  "processed": 15,
  "pendingUrls": ["url16", "url17", ...],
  "summaries": ["Post 1: ็ซๅก˜...", ...]
}

Return to caller:
{
  "complete": false,
  "processed": 15,
  "remaining": 25,
  "statePath": "/research/{task-id}/state.json",
  "findingsDir": "/research/{task-id}/findings/"
}

3. Caller spawns fresh sub-agent to continue:

spawn_subagent(
  task="Continue XHS research from /research/{task-id}/state.json",
  model="claude-sonnet-4-5-20250929"
)

New sub-agent has fresh 200k context, reads state.json, continues from post 16.

State File Schema

{
  "taskId": "kunming-food-2026-02-01",
  "query": "ๆ˜†ๆ˜Ž็พŽ้ฃŸ",
  "searchesCompleted": ["ๆ˜†ๆ˜Ž็พŽ้ฃŸ", "ๆ˜†ๆ˜Ž็พŽ้ฃŸๆŽจ่"],  // Keywords already searched
  "processedUrls": ["url1", "url2", ...],             // Explicit URL tracking (prevents duplicates)
  "pendingUrls": ["url3", "url4", ...],               // Remaining URLs to process
  "nextPostNumber": 16,                                // Next post-XXX.md number
  "summaries": [                                       // 1-liner per post for final synthesis
    "Post 1: ็ซๅก˜้คๅŽ… | ๐ŸŸข | ยฅ80 | ๆœฌๅœฐไบบๆŽจ่",
    "Post 2: ้‡Ž็”Ÿ่Œ็ซ้”… | ๐ŸŸข | ยฅ120 | ่Œๅญๆ–ฐ้ฒœ"
  ],
  "batchNumber": 1,
  "contextCheckpoint": "70%"
}

Critical fields for handoff:

  • processedUrls: Prevents re-processing same post across sub-agents
  • pendingUrls: Exact work remaining
  • nextPostNumber: Ensures sequential file naming
  • searchesCompleted: Prevents duplicate searches

Workflow for Large Research

Caller should use longer timeout:

sessions_spawn(
  task="...",
  model="claude-sonnet-4-5-20250929",
  runTimeoutSeconds=1800  // 30 minutes for research tasks
)

Default is 600s (10 min) โ€” too short for XHS research with slow API calls.

Interleave search and processing (don't collect all URLs first):

[XHS Sub-agent 1]
    โ”œโ”€โ”€ Check for state.json (none = fresh start)
    โ”œโ”€โ”€ Search keyword 1 โ†’ get 20 URLs
    โ”œโ”€โ”€ Process 5-10 posts immediately (writing each to disk)
    โ”œโ”€โ”€ Search keyword 2 โ†’ get more URLs (dedupe)
    โ”œโ”€โ”€ Process more posts
    โ”œโ”€โ”€ Context hits 70% โ†’ write state.json
    โ””โ”€โ”€ Return {complete: false, remaining: N}

This prevents timeout from losing all work โ€” each post is saved as processed.

Full continuation pattern:

[Caller]
    โ†“ spawn (runTimeoutSeconds=1800)
[XHS Sub-agent 1]
    โ”œโ”€โ”€ Search + process interleaved
    โ”œโ”€โ”€ Context hits 70% โ†’ write state.json
    โ””โ”€โ”€ Return {complete: false, remaining: 25}
    
[Caller sees incomplete]
    โ†“ spawn continuation (runTimeoutSeconds=1800)
[XHS Sub-agent 2]  โ† fresh 200k context!
    โ”œโ”€โ”€ Read state.json (has processedUrls, pendingUrls)
    โ”œโ”€โ”€ Continue processing + more searches if needed
    โ”œโ”€โ”€ Context hits 70% โ†’ write state.json
    โ””โ”€โ”€ Return {complete: false, remaining: 10}
    
[Caller sees incomplete]
    โ†“ spawn continuation
[XHS Sub-agent 3]
    โ”œโ”€โ”€ Read state.json
    โ”œโ”€โ”€ Process remaining posts
    โ”œโ”€โ”€ All done โ†’ write synthesis.md
    โ””โ”€โ”€ Return {complete: true, synthesisPath: "..."}

Output Directory Structure

/research/{task-id}/
โ”œโ”€โ”€ state.json              # Checkpoint for continuation
โ”œโ”€โ”€ findings/
โ”‚   โ”œโ”€โ”€ post-001.md         # Full analysis + image paths
โ”‚   โ”œโ”€โ”€ post-002.md
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ images/
โ”‚   โ”œโ”€โ”€ post-001/
โ”‚   โ”‚   โ”œโ”€โ”€ 1.jpg
โ”‚   โ”‚   โ””โ”€โ”€ 2.jpg
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ summaries.md            # All 1-liners (for quick scan)
โ””โ”€โ”€ synthesis.md            # Final output (when complete)

Key Rules (ALWAYS FOLLOW)

  1. Write after EVERY post โ€” crash-safe, no work lost
  2. Check context after EVERY post โ€” use session_status tool
  3. Stop at 70% โ€” leave room for synthesis + buffer
  4. Return structured result โ€” caller decides next step
  5. Read all images โ€” they're pre-compressed (600px, q85)
  6. Skip videos โ€” already marked in fetch-post

โš ๏ธ This is not optional. Even small research can overflow context with image-heavy posts.


Scripts (Mechanical Tasks)

These scripts handle the repetitive CLI work:

Script Purpose
bin/preflight Verify tool is working before research
bin/search "keywords" [limit] [timeout] [sort] Search for posts (sort: general/newest/hot)
bin/get-content "url" Get full note content (text only)
bin/get-comments "url" Get comments on a note
bin/get-images "url" [dir] Download images only
bin/fetch-post "url" [cache] [retries] Fetch content + comments + images (with retries)

All scripts are at /root/clawd/skills/xhs/bin/

Preflight (always run first)

/root/clawd/skills/xhs/bin/preflight

Checks: rednote-mcp installed, cookies valid, stealth patches, test search. Don't proceed until preflight passes.

Search

/root/clawd/skills/xhs/bin/search "ๆ˜†ๆ˜Ž็พŽ้ฃŸๆŽจ่" [limit] [timeout] [sort]

Returns JSON with post results.

Parameters:

Param Default Description
keywords (required) Search terms in Chinese
limit 10 Max results (scroll pagination when >20)
timeout 180 Seconds before giving up
sort general Sort order (see below)

Sort options:

Value XHS Label When to use
general ็ปผๅˆ Default โ€” XHS algorithm balances relevance + engagement. Best for most research.
newest ๆœ€ๆ–ฐ ่ˆ†ๆƒ…็›‘ๆŽง, breaking news, recent experiences, time-sensitive topics
hot ๆœ€็ƒญ Finding viral/popular posts, trending content

Examples:

# Default sort (recommended for most research)
bin/search "ๆ˜†ๆ˜Ž็พŽ้ฃŸๆŽจ่" 20

# Recent posts first (่ˆ†ๆƒ…, current events)
bin/search "ๆŸๅ“็‰Œ ่ฏ„ไปท" 20 180 newest

# Most popular posts
bin/search "็ฝ‘็บขๆ‰“ๅกๅœฐ" 15 180 hot

Scroll pagination enabled (patched): When limit > 20, the tool scrolls to load more results via XHS infinite scroll. Actual results depend on available content.

For maximum coverage, combine:

  1. Higher limits (e.g., limit=50) to scroll for more
  2. Multiple keyword variations for different result sets:
    • ้ฆ™่•‰ๆ”€ๅฒฉ, ้ฆ™่•‰ๆ”€ๅฒฉ้ฆ†, ้ฆ™่•‰ๆ”€ๅฒฉไฝ“้ชŒ, ้ฆ™่•‰ๆ”€ๅฒฉ่ฏ„ไปท
    • ๆ˜†ๆ˜Ž็พŽ้ฃŸ, ๆ˜†ๆ˜Ž็พŽ้ฃŸๆŽจ่, ๆ˜†ๆ˜Žๅฟ…ๅƒ, ๆ˜†ๆ˜ŽๆœฌๅœฐไบบๆŽจ่

Results vary by query โ€” popular topics may return 30-50+, niche topics fewer.

Choosing sort order:

  • Most research โ†’ general (default). Let XHS's algorithm surface the best content.
  • ่ˆ†ๆƒ…็›‘ๆŽง / sentiment tracking โ†’ newest. You want recent opinions, not old viral posts.
  • Trend discovery โ†’ hot. See what's currently popular.

Get Content

/root/clawd/skills/xhs/bin/get-content "FULL_URL_WITH_XSEC_TOKEN"

โš ๏ธ Must use full URL with xsec_token from search results.

Get Comments

/root/clawd/skills/xhs/bin/get-comments "FULL_URL_WITH_XSEC_TOKEN"

Get Images

Download all images from a post to local files:

/root/clawd/skills/xhs/bin/get-images "FULL_URL" /tmp/my-images

Fetch Post (Deep Dive with Images)

Fetch content, comments, and images in one call โ€” with built-in retries:

/root/clawd/skills/xhs/bin/fetch-post "FULL_URL" /path/to/cache [max_retries]

Features:

  • Retries on timeout (60s โ†’ 90s โ†’ 120s)
  • Clear error reporting in JSON output
  • Images cached locally, bypassing CDN protection

Returns JSON:

{
  "success": true,
  "postId": "abc123",
  "content": { 
    "title": "...", 
    "author": "...", 
    "desc": "...", 
    "likes": "983", 
    "tags": [...],
    "postDate": "2025-09-04"  // โ† Added via patch!
  },
  "comments": [{ "author": "...", "content": "...", "likes": "3" }, ...],
  "imagePaths": ["/cache/images/abc123/1.jpg", ...],
  "errors": []
}

Date filtering: Use postDate to filter out old posts. Skip posts older than your threshold (e.g., 6-12 months for restaurants).

Workflow:

1. fetch-post โ†’ JSON + cached images
2. Read each imagePath directly (Claude sees images natively)
3. Combine text + comments + what you see into findings

Viewing images:

Read("/path/to/1.jpg")  # Claude sees it directly - no special tool needed

Look for: visible text (addresses, prices, hours), atmosphere, food presentation, crowd levels.


Research Methodology (Judgment Tasks)

This is where you think. Scripts do the fetching; you do the analyzing.

Depth Levels

Depth Posts When to Use
Minimum 5+ Quick checks, simple queries
Standard 8-10 Default for most research
Deep 15+ Complex topics, trip planning

Minimum is 5 โ€” unless fewer exist. Note limited coverage if <5 results.

Research Workflow

Step 0: Preflight

Run bin/preflight. Don't proceed until it passes.

Step 1: Plan Your Searches

Think: "What would a Chinese user search on ๅฐ็บขไนฆ?"

  • Include location when relevant
  • Add qualifiers: ๆŽจ่, ๆ”ป็•ฅ, ๆต‹่ฏ„, ๆŽขๅบ—, ๆ‰“ๅก, ้ฟๅ‘
  • Consider synonyms and variations
  • Plan 2-3 different search angles

Date filtering: Posts include postDate field (e.g., "2025-09-04"). The calling agent specifies the date filter based on research type:

Research Type Suggested Filter Why
่ˆ†ๆƒ…็›‘ๆŽง (sentiment) 1-4 weeks Only current discourse matters
Breaking news/events 1-7 days Time-critical
Travel planning 6-12 months Recent but reasonable window
Product reviews 1-2 years Longer product cycles
Trend analysis Custom range Compare specific periods
Historical/general No limit Want the full archive

Caller should specify in task description, e.g.:

  • "Only posts from last 30 days" (่ˆ†ๆƒ…)
  • "Posts from 2025 or later" (travel)
  • "No date filter" (general research)

If no filter specified: Default to 12 months (safe middle ground).

Fallback when postDate is null: Use keyword hints: 2025, ๆœ€่ฟ‘, ๆœ€ๆ–ฐ

Language strategy:

Location Language Example
China Chinese ๆ˜†ๆ˜Žๆ”€ๅฒฉ
English-named venues Both Rock Tenet ๆ˜†ๆ˜Ž
International Chinese ๅทด้ปŽๆ—…ๆธธ

Step 2: Search & Scan

Run your searches. Results are already ranked by XHS's algorithm (relevance + engagement).

Use judgment based on preview โ€” like a human deciding what to click:

Think: "Given my research goal, would this post likely contain useful information?"

Research Type What to prioritize
่ˆ†ๆƒ…็›‘ๆŽง (sentiment) Any opinion/experience, even low engagement โ€” complaints matter!
Travel planning High engagement + detailed experiences
Product reviews Mix of positive AND negative reviews
Trend analysis Variety of perspectives
Preview Signal Action
Relevant content in preview โœ… Fetch
Matches research goal โœ… Fetch
Low engagement but relevant opinion โœ… Fetch (esp. for ่ˆ†ๆƒ…)
High engagement but off-topic โŒ Skip
Official announcements only โš ๏ธ Context-dependent
ๅนฟๅ‘Š/ๅˆไฝœ markers โš ๏ธ Note as sponsored if fetching
Clearly off-topic โŒ Skip
Duplicate content โŒ Skip

Key insight: For ่ˆ†ๆƒ…็›‘ๆŽง, a 3-like complaint post may be more valuable than a 500-like promotional post. Engagement โ‰  relevance for all research types.

Step 3: Deep Dive Each Post

For each selected post, use fetch-post to get everything:

bin/fetch-post "url_from_search" {{RESEARCH_DIR}}/xhs

Returns JSON with content, comments, and cached images. Has built-in retries. Then:

A. Review content

  • Extract key facts from title/description
  • Note author's perspective/bias
  • Check tags for categorization

B. View images (critical!) For each imagePath in the result, just read it:

Read("/path/to/1.jpg")  # You see it directly
  • Look for text overlays: addresses, prices, hours
  • Note visual details: ambiance, crowd levels, food presentation

โš ๏ธ Don't describe images in isolation. Synthesize what you see with the post content and comments to form a holistic view. An image of a crowded restaurant + author saying "ๅ‘จๆœซๆŽ’้˜Ÿ1ๅฐๆ—ถ" + comments confirming "ไบบ่ถ…ๅคš" = that's your finding about crowds.

C. Review comments (gold for updates)

  • "ๅทฒ็ปๅ…ณ้—จไบ†" = already closed
  • Real experiences vs sponsored hype
  • Tips not in main post

D. Return picked images Include paths to the best/most informative images in your findings. The calling agent decides whether and how to use them (embed in reports, reference, etc.). You're curating โ€” pick images that show something useful (venue exterior, menu with prices, actual food, atmosphere) not just decorative shots.

Step 4: Synthesize

  • What do multiple sources agree on?
  • Any contradictions?
  • What's the overall consensus?
  • What would you actually recommend?

Step 5: Output

Facts + Flavor โ€” structured findings that preserve the XHS voice.

## XHS Research: [Topic]

### Search Summary
| Search | Results | Notes |
|--------|---------|-------|
| ๆ˜†ๆ˜Žๆ”€ๅฒฉ | 10 | Good coverage |

### Findings

#### [Venue Name] (ไธญๆ–‡ๅ)
- **Type:** Restaurant / Activity / Attraction
- **Address:** [from post or image]
- **Price:** ยฅXX/person
- **Hours:** [if found]
- **The vibe:** [atmosphere, energy โ€” preserved voice]
- **Why people like it:** [opinions, impressions]
- **Watch out for:** [warnings from comments]
- **Source:** [full URL]
- **Engagement:** X likes
- **Images:** [paths for calling agent to use]
  - `/path/to/1.jpg` โ€” exterior/entrance
  - `/path/to/3.jpg` โ€” menu with prices

> "ๅผ•็”จๅŽŸๆ–‡..." โ€” @username

### Overall Impressions
- Consensus across posts
- Patterns in preferences
- Things only locals know
- Disagreements worth noting

The XHS value is the human perspective. A recommendation that says "็Žฏๅขƒไธ€่ˆฌไฝ†ๆ˜ฏๅ‘ณ้“็ปไบ†" tells you more than "Rating: 4.2/5".

Think: "What would a friend who just spent an hour on XHS tell me?"


Quality Signals

Trustworthy:

  • 100+ likes with real comments
  • Detailed personal experience
  • Multiple photos from actual visit
  • Specific details (prices, hours)
  • Recent posts (look for date mentions in content: "ไธŠๅ‘จ", "ๆ˜จๅคฉ", "2025ๅนดXๆœˆ")
  • Year in title (e.g., "2025ไธŠๆตทๅ’–ๅ•กๅฟ…ๅ–ๆฆœ")

Checking recency:

  • Look for dates in post text/title
  • Check if prices seem current
  • Comments mentioning "่ฟ˜ๅœจๅ—" or "็Žฐๅœจ่ฟ˜ๆœ‰ๅ—" = might be outdated
  • Comments with recent dates confirm post is still relevant

Suspicious:

  • ๅนฟๅ‘Š/ๅˆไฝœ/่ตžๅŠฉ markers
  • Overly positive, no specifics
  • Stock photos only
  • No comments or generic ones
  • Very old posts

Timing & Efficiency

XHS is SLOW โ€” Plan Accordingly

The rednote-mcp CLI is slow (30-90s per search). Don't rapid-fire poll.

When running searches via exec:

# GOOD: Give it time to complete
exec(command, yieldMs: 60000)  # Wait 60s before checking
process(poll)  # Then poll every 30s if still running

DON'T:

  • Poll every 2-3 seconds (wastes tokens, no benefit)
  • Start multiple searches simultaneously (overloads)
  • Wait indefinitely without writing partial results

Write Incrementally

Don't wait until you've analyzed everything to start writing. After each batch of 3-5 posts:

  • Append findings to your output file
  • This protects against timeout/termination losing all work
## Findings (in progress)

### Batch 1: ็พŽ้ฃŸๆœ็ดข (3 posts analyzed)
[findings...]

### Batch 2: ๆ”ป็•ฅๆœ็ดข (analyzing...)

Time Budget Awareness

If you've been running 15+ minutes:

  • Prioritize writing what you have
  • Note incomplete searches in output
  • Better to deliver 80% findings than lose 100% to termination

Retry Pattern

rednote-mcp is slow. If a command times out:

Attempt 1: default timeout
Attempt 2: +60s
Attempt 3: +120s

If all fail, report the failure. Do NOT fall back to web_search โ€” defeats the purpose.


Error Handling

Error Cause Fix
Timeout Network/XHS slow Retry with longer timeout
Login/cookie error Session expired xvfb-run -a rednote-mcp init
404 / xsec_token Missing token Use full URL from search
Empty results No posts Try different keywords

Setup & Maintenance

First-Time Setup

npm install -g rednote-mcp
npx playwright install
/root/clawd/skills/xhs/patches/apply-all.sh
xvfb-run -a rednote-mcp init

Re-login (when cookies expire)

xvfb-run -a rednote-mcp init

After rednote-mcp updates

/root/clawd/skills/xhs/patches/apply-all.sh

Role Clarification

This skill = Research tool that outputs structured findings Calling agent = Synthesizes XHS + other sources into final reports, decides which images to embed

You return:

  • Synthesized findings (text + images + comments โ†’ holistic view)
  • Curated image paths (calling agent decides how to use them)
  • Preserved human voice (opinions, vibes, tips)

You don't:

  • Describe images in isolation ("I see a restaurant...")
  • Generate final reports (that's the caller's job)
  • Decide image layout/placement

XHS is like having a Chinese-speaking friend spend an hour researching for you. They'd give you facts, but also opinions, vibes, and insider tips. That's what you're capturing.


Remember: Research like a curious human. Explore, cross-reference, look at pictures, read comments. The "่ฟ™ๅฎถ็œŸ็š„็ปไบ†" matters as much as the address.

Weekly Installs
5
GitHub Stars
2
First Seen
Feb 9, 2026
Installed on
openclaw5
gemini-cli3
github-copilot3
amp3
codex3
kimi-cli3