research-lit
Research Literature Review
Research topic: $ARGUMENTS
Constants
- REVIEWER_BACKEND =
codex— Default: Codex MCP (xhigh). Override with— reviewer: oracle-profor GPT-5.4 Pro via Oracle MCP. Seeshared-references/reviewer-routing.md. - PAPER_LIBRARY — Local directory containing user's paper collection (PDFs). Check these paths in order:
papers/in the current project directoryliterature/in the current project directory- Custom path specified by user in
CLAUDE.mdunder## Paper Library
- MAX_LOCAL_PAPERS = 20 — Maximum number of local PDFs to scan (read first 3 pages each). If more are found, prioritize by filename relevance to the topic.
- ARXIV_DOWNLOAD = false — When
true, download top 3-5 most relevant arXiv PDFs to PAPER_LIBRARY after search. Whenfalse(default), only fetch metadata (title, abstract, authors) via arXiv API — no files are downloaded. - ARXIV_MAX_DOWNLOAD = 5 — Maximum number of PDFs to download when
ARXIV_DOWNLOAD = true.
💡 Overrides:
/research-lit "topic" — paper library: ~/my_papers/— custom local PDF path/research-lit "topic" — sources: zotero, local— only search Zotero + local PDFs/research-lit "topic" — sources: zotero— only search Zotero/research-lit "topic" — sources: web— only search the web (skip all local)/research-lit "topic" — sources: web, semantic-scholar— also search Semantic Scholar for published venue papers (IEEE, ACM, etc.)/research-lit "topic" — sources: deepxiv— only search via DeepXiv progressive retrieval/research-lit "topic" — sources: all, deepxiv— use default sources plus DeepXiv/research-lit "topic" — arxiv download: true— download top relevant arXiv PDFs/research-lit "topic" — arxiv download: true, max download: 10— download up to 10 PDFs
Data Sources
This skill checks multiple sources in priority order. All are optional — if a source is not configured or not requested, skip it silently.
Source Selection
Parse $ARGUMENTS for a — sources: directive:
- If
— sources:is specified: Only search the listed sources (comma-separated). Valid values:zotero,obsidian,local,web,semantic-scholar,deepxiv,exa,gemini,openalex,all. - If not specified: Default to
all— search every available source in priority order (semantic-scholar,deepxiv,exa,gemini, andopenalexare excluded fromall; they must be explicitly listed).
Examples:
/research-lit "diffusion models" → all (default, no S2)
/research-lit "diffusion models" — sources: all → all (default, no S2)
/research-lit "diffusion models" — sources: zotero → Zotero only
/research-lit "diffusion models" — sources: zotero, web → Zotero + web
/research-lit "diffusion models" — sources: local → local PDFs only
/research-lit "topic" — sources: obsidian, local, web → skip Zotero
/research-lit "topic" — sources: web, semantic-scholar → web + S2 API (IEEE/ACM venue papers)
/research-lit "topic" — sources: deepxiv → DeepXiv only
/research-lit "topic" — sources: all, deepxiv → default sources + DeepXiv
/research-lit "topic" — sources: all, semantic-scholar → all + S2 API
/research-lit "topic" — sources: exa → Exa only (broad web + content extraction)
/research-lit "topic" — sources: all, exa → default sources + Exa web search
/research-lit "topic" — sources: gemini → Gemini only (AI-powered broad discovery)
/research-lit "topic" — sources: all, gemini → default sources + Gemini discovery
/research-lit "topic" — sources: gemini, semantic-scholar → Gemini + S2 (broad discovery + venue metadata)
/research-lit "topic" — sources: openalex → OpenAlex only (open citation graph + institutions)
/research-lit "topic" — sources: semantic-scholar, openalex → S2 + OpenAlex (complementary metadata)
Source Table
| Priority | Source | ID | How to detect | What it provides |
|---|---|---|---|---|
| 1 | Zotero (via MCP) | zotero |
Try calling any mcp__zotero__* tool — if unavailable, skip |
Collections, tags, annotations, PDF highlights, BibTeX, semantic search |
| 2 | Obsidian (via MCP) | obsidian |
Try calling any mcp__obsidian-vault__* tool — if unavailable, skip |
Research notes, paper summaries, tagged references, wikilinks |
| 3 | Local PDFs | local |
Glob: papers/**/*.pdf, literature/**/*.pdf |
Raw PDF content (first 3 pages) |
| 4 | Web search | web |
Always available (WebSearch) | arXiv, Semantic Scholar, Google Scholar |
| 5 | Semantic Scholar API | semantic-scholar |
tools/semantic_scholar_fetch.py exists |
Published venue papers (IEEE, ACM, Springer) with structured metadata: citation counts, venue info, TLDR. Only runs when explicitly requested via — sources: semantic-scholar or — sources: web, semantic-scholar |
| 6 | DeepXiv CLI | deepxiv |
tools/deepxiv_fetch.py and installed deepxiv CLI |
Progressive paper retrieval: search, brief, head, section, trending, web search. Only runs when explicitly requested via — sources: deepxiv or — sources: all, deepxiv |
| 7 | Exa Search | exa |
tools/exa_search.py and installed exa-py SDK |
AI-powered broad web search with content extraction (highlights, text, summaries). Covers blogs, docs, news, companies, and research papers beyond arXiv/S2. Only runs when explicitly requested via — sources: exa or — sources: all, exa |
| 8 | Gemini (MCP / CLI) | gemini |
mcp__gemini-cli__ask-gemini tool available, or gemini CLI installed |
AI-powered broad literature discovery — decomposes topics into sub-problems, aliases, and variants for wider retrieval. Prefers MCP, falls back to CLI. Only runs when explicitly requested via — sources: gemini or — sources: all, gemini |
| 9 | OpenAlex | openalex |
tools/openalex_fetch.py exists |
Open citation graph with institutional affiliations, funding data, and comprehensive metadata across 250M+ works. Fully open API. Only runs when explicitly requested via — sources: openalex or — sources: all, openalex |
Graceful degradation: If no MCP servers are configured, the skill works exactly as before (local PDFs + web search). Zotero and Obsidian are pure additions.
Workflow
Step 0a: Search Zotero Library (if available)
Skip this step entirely if Zotero MCP is not configured.
Try calling a Zotero MCP tool (e.g., search). If it succeeds:
- Search by topic: Use the Zotero search tool to find papers matching the research topic
- Read collections: Check if the user has a relevant collection/folder for this topic
- Extract annotations: For highly relevant papers, pull PDF highlights and notes — these represent what the user found important
- Export BibTeX: Get citation data for relevant papers (useful for
/paper-writelater) - Compile results: For each relevant Zotero entry, extract:
- Title, authors, year, venue
- User's annotations/highlights (if any)
- Tags the user assigned
- Which collection it belongs to
📚 Zotero annotations are gold — they show what the user personally highlighted as important, which is far more valuable than generic summaries.
Step 0b: Search Obsidian Vault (if available)
Skip this step entirely if Obsidian MCP is not configured.
Try calling an Obsidian MCP tool (e.g., search). If it succeeds:
- Search vault: Search for notes related to the research topic
- Check tags: Look for notes tagged with relevant topics (e.g.,
#diffusion-models,#paper-review) - Read research notes: For relevant notes, extract the user's own summaries and insights
- Follow links: If notes link to other relevant notes (wikilinks), follow them for additional context
- Compile results: For each relevant note:
- Note title and path
- User's summary/insights
- Links to other notes (research graph)
- Any frontmatter metadata (paper URL, status, rating)
📝 Obsidian notes represent the user's processed understanding — more valuable than raw paper content for understanding their perspective.
Step 0c: Scan Local Paper Library
Before searching online, check if the user already has relevant papers locally:
-
Locate library: Check PAPER_LIBRARY paths for PDF files
Glob: papers/**/*.pdf, literature/**/*.pdf -
De-duplicate against Zotero: If Step 0a found papers, skip any local PDFs already covered by Zotero results (match by filename or title).
-
Filter by relevance: Match filenames and first-page content against the research topic. Skip clearly unrelated papers.
-
Summarize relevant papers: For each relevant local PDF (up to MAX_LOCAL_PAPERS):
- Read first 3 pages (title, abstract, intro)
- Extract: title, authors, year, core contribution, relevance to topic
- Flag papers that are directly related vs tangentially related
-
Build local knowledge base: Compile summaries into a "papers you already have" section. This becomes the starting point — external search fills the gaps.
📚 If no local papers are found, skip to Step 1. If the user has a comprehensive local collection, the external search can be more targeted (focus on what's missing).
Step 1: Search (external)
- Use WebSearch to find recent papers on the topic
- Check arXiv, Semantic Scholar, Google Scholar
- Focus on papers from last 2 years unless studying foundational work
- De-duplicate: Skip papers already found in Zotero, Obsidian, or local library
arXiv API search (always runs, no download by default):
Locate the fetch script and search arXiv directly:
# Try to find arxiv_fetch.py
SCRIPT=$(find tools/ -name "arxiv_fetch.py" 2>/dev/null | head -1)
# If not found, check ARIS install
[ -z "$SCRIPT" ] && SCRIPT=$(find ~/.claude/skills/arxiv/ -name "arxiv_fetch.py" 2>/dev/null | head -1)
# Search arXiv API for structured results (title, abstract, authors, categories)
python3 "$SCRIPT" search "QUERY" --max 10
If arxiv_fetch.py is not found, fall back to WebSearch for arXiv (same as before).
The arXiv API returns structured metadata (title, abstract, full author list, categories, dates) — richer than WebSearch snippets. Merge these results with WebSearch findings and de-duplicate.
Semantic Scholar API search (only when semantic-scholar is in sources):
When the user explicitly requests — sources: semantic-scholar (or — sources: web, semantic-scholar), search for published venue papers beyond arXiv:
S2_SCRIPT=$(find tools/ -name "semantic_scholar_fetch.py" 2>/dev/null | head -1)
[ -z "$S2_SCRIPT" ] && S2_SCRIPT=$(find ~/.claude/skills/semantic-scholar/ -name "semantic_scholar_fetch.py" 2>/dev/null | head -1)
# Search for published CS/Engineering papers with quality filters
python3 "$S2_SCRIPT" search "QUERY" --max 10 \
--fields-of-study "Computer Science,Engineering" \
--publication-types "JournalArticle,Conference"
If semantic_scholar_fetch.py is not found, skip silently.
Why use Semantic Scholar? Many IEEE/ACM journal papers are NOT on arXiv. S2 fills the gap for published venue-only papers with citation counts and venue metadata.
De-duplication between arXiv and S2: Match by arXiv ID (S2 returns externalIds.ArXiv):
- If a paper appears in both: check S2's
venue/publicationVenue— if it has been published in a journal/conference (e.g. IEEE TWC, JSAC), use S2's metadata (venue, citationCount, DOI) as the authoritative version, since the published version supersedes the preprint. Keep the arXiv PDF link for download. - If the S2 match has no venue (still just a preprint indexed by S2): keep the arXiv version as-is.
- S2 results without
externalIds.ArXivare venue-only papers not on arXiv — these are the unique value of this source.
DeepXiv search (only when deepxiv is in sources):
When the user explicitly requests — sources: deepxiv (or includes deepxiv in a combined source list), use the DeepXiv adapter for progressive retrieval:
python3 tools/deepxiv_fetch.py search "QUERY" --max 10
Then deepen only for the most relevant papers:
python3 tools/deepxiv_fetch.py paper-brief ARXIV_ID
python3 tools/deepxiv_fetch.py paper-head ARXIV_ID
python3 tools/deepxiv_fetch.py paper-section ARXIV_ID "Experiments"
If tools/deepxiv_fetch.py or the deepxiv CLI is unavailable, skip this source gracefully and continue with the remaining requested sources.
Why use DeepXiv? It is useful when a broad search should be followed by staged reading rather than immediate full-paper loading. This reduces unnecessary context while still surfacing structure, TLDRs, and the most relevant sections.
De-duplication against arXiv and S2:
- Match by arXiv ID first, DOI second, normalized title third
- If DeepXiv and arXiv refer to the same preprint, keep one canonical paper row and record
deepxivas an additional source - If DeepXiv overlaps with S2 on a published paper, prefer S2 venue/citation metadata in the final table, but keep DeepXiv-derived section notes when they add value
Exa search (only when exa is in sources):
When the user explicitly requests — sources: exa (or includes exa in a combined source list), use the Exa tool for broad AI-powered web search with content extraction:
EXA_SCRIPT=$(find tools/ -name "exa_search.py" 2>/dev/null | head -1)
# Search for research papers with highlights
python3 "$EXA_SCRIPT" search "QUERY" --max 10 --category "research paper" --content highlights
# Search for broader web content (blogs, docs, news)
python3 "$EXA_SCRIPT" search "QUERY" --max 10 --content highlights
If tools/exa_search.py or the exa-py SDK is unavailable, skip this source gracefully and continue with the remaining requested sources.
Why use Exa? Exa provides AI-powered search across the broader web (blogs, documentation, news, company pages) with built-in content extraction. It fills a gap between academic databases (arXiv, S2) and generic WebSearch by returning richer content with each result.
De-duplication against arXiv, S2, and DeepXiv:
- Match by URL first, then normalized title
- If Exa returns an arXiv paper already found by arXiv/S2, prefer the structured metadata from those sources
- Exa results from non-academic domains (blogs, docs, news) are unique value not covered by other sources
Gemini search (only when gemini is in sources):
When the user explicitly requests — sources: gemini (or includes gemini in a combined source list), use Gemini for AI-powered broad literature discovery.
Priority 1 — Gemini MCP (preferred): Call mcp__gemini-cli__ask-gemini with the search prompt:
mcp__gemini-cli__ask-gemini({
prompt: 'You are a research literature scout. Search comprehensively for papers on: "QUERY"
IMPORTANT CONSTRAINTS:
1. Search from MULTIPLE angles — decompose the topic into sub-problems, aliases, neighboring tasks, and common benchmark/settings variants.
2. Prefer papers that are genuinely relevant, not merely keyword-adjacent.
3. Include top venues, journals, surveys, recent preprints, and papers with code when available.
4. Focus on papers from 2022 onward unless older foundational work is necessary.
For EACH paper found, provide ALL of the following:
- Title: [exact title]
- Authors: [full author list]
- Year: [publication year]
- Venue: [exact conference/journal name + year, or "arXiv preprint"]
- arXiv ID: [format 2401.12345, or "N/A"]
- DOI: [if available, or "N/A"]
- Code URL: [GitHub/GitLab link if available, or "No code"]
- Summary: [one-sentence core contribution]
Find at least 15 papers.',
model: 'gemini-2.5-pro'
})
Priority 2 — Gemini CLI fallback (if MCP unavailable): Use gemini -p "...same prompt..." 2>/dev/null via Bash (timeout: 120s).
If both MCP and CLI are unavailable, skip this source gracefully and continue with the remaining requested sources.
Why use Gemini? Gemini provides AI-driven discovery that goes beyond keyword matching — it decomposes topics, explores naming variants, and surfaces papers that traditional API-based searches (arXiv, S2) may miss. It fills a different retrieval niche from structured database queries.
De-duplication against arXiv, S2, DeepXiv, and Exa:
- Match by arXiv ID first, DOI second, normalized title third
- If Gemini returns a paper already found by S2, prefer S2's citation count and venue metadata
- If Gemini returns a paper already found by arXiv, prefer arXiv's structured metadata
- Gemini's unique value is discovering papers that other keyword-based indexes did not surface
- Do not use Gemini-reported citation counts — they may be inaccurate. Use S2 for authoritative citation data.
OpenAlex search (only when openalex is in sources):
When the user explicitly requests — sources: openalex (or includes openalex in a combined source list), use OpenAlex API for comprehensive academic metadata:
OA_SCRIPT=$(find tools/ -name "openalex_fetch.py" 2>/dev/null | head -1)
# Preflight: skip OpenAlex silently if either openalex_fetch.py or the
# `requests` Python package is unavailable. Both checks must pass before
# the script is invoked, so users without `requests` installed never see
# a stack trace from a default `/research-lit` run.
if [ -z "$OA_SCRIPT" ] || ! python3 -c "import requests" >/dev/null 2>&1; then
echo "OpenAlex source not available (missing tools/openalex_fetch.py or 'requests' module); skipping." >&2
else
# Search for papers with comprehensive metadata
python3 "$OA_SCRIPT" search "QUERY" --max 10 \
--year "2022-" \
--type article \
--sort relevance
fi
If openalex_fetch.py is not found or requests module is missing, skip this source gracefully and continue with the remaining requested sources.
Why use OpenAlex? Fully open citation graph (no API key required), institutional affiliations, funding data (NSF, NIH), comprehensive topic/keyword metadata, and coverage across all disciplines (not just CS).
De-duplication against arXiv, S2, DeepXiv, Exa, and Gemini:
- Match by DOI first (OpenAlex has DOI for most works), then arXiv ID, then normalized title
- If OpenAlex and S2 both have the same paper:
- Prefer S2 for citation counts (more up-to-date)
- Prefer S2 for venue metadata (more accurate for CS/AI papers)
- Use OpenAlex for institutional affiliations and funding data (unique value)
- Merge both into a richer record
- If OpenAlex and arXiv overlap, prefer arXiv's PDF link and metadata, but keep OpenAlex's citation/institution data
- OpenAlex's unique value: institutional affiliations, funding sources, comprehensive topic classification, and cross-discipline coverage
Optional PDF download (only when ARXIV_DOWNLOAD = true):
After all sources are searched and papers are ranked by relevance:
# Download top N most relevant arXiv papers
python3 "$SCRIPT" download ARXIV_ID --dir papers/
- Only download papers ranked in the top ARXIV_MAX_DOWNLOAD by relevance
- Skip papers already in the local library
- 1-second delay between downloads (rate limiting)
- Verify each PDF > 10 KB
Step 2: Analyze Each Paper
For each relevant paper (from all sources), extract:
- Problem: What gap does it address?
- Method: Core technical contribution (1-2 sentences)
- Results: Key numbers/claims
- Relevance: How does it relate to our work?
- Source: Where we found it (Zotero/Obsidian/local/web) — helps user know what they already have vs what's new
Step 3: Synthesize
- Group papers by approach/theme
- Identify consensus vs disagreements in the field
- Find gaps that our work could fill
- If Obsidian notes exist, incorporate the user's own insights into the synthesis
Step 4: Output
Present as a structured literature table:
| Paper | Venue | Method | Key Result | Relevance to Us | Source |
|-------|-------|--------|------------|-----------------|--------|
Plus a narrative summary of the landscape (3-5 paragraphs).
If Zotero BibTeX was exported, include a references.bib snippet for direct use in paper writing.
Step 5: Save (if requested)
- Save paper PDFs to
literature/orpapers/ - Update related work notes in project memory
- If Obsidian is available, optionally create a literature review note in the vault
Step 6: Update Research Wiki
Required when research-wiki/ exists. Skip entirely (no action, no
error) if the directory is absent. Per
shared-references/integration-contract.md,
this step follows the canonical ingest contract — business logic lives
in tools/research_wiki.py, not in this prose.
📋 Research Wiki ingest (runs once, at end of research-lit):
[ ] 1. Predicate: `research-wiki/` exists? If no, skip this step.
[ ] 2. For each of the top 8–12 relevant papers (arxiv IDs collected above):
python3 tools/research_wiki.py ingest_paper research-wiki/ \
--arxiv-id <id> [--thesis "<one-line>"] [--tags <t1>,<t2>]
[ ] 3. For each explicit relationship to an existing wiki entity,
add an edge:
python3 tools/research_wiki.py add_edge research-wiki/ \
--from "paper:<slug>" --to "<target_node_id>" \
--type <extends|contradicts|addresses_gap|inspired_by|...> \
--evidence "<one-sentence quote or reasoning>"
[ ] 4. Confirm papers/<slug>.md files were created (helper prints
"Paper ingested: ..."); if any failed with a network error,
retry or fall back to the --title/--authors/--year manual form.
ingest_paper handles slug generation, arXiv metadata fetch, dedup
(skips an existing paper by arXiv id), page rendering, index.md
rebuild, query_pack.md rebuild, and log append in a single call —
do not manually write papers/<slug>.md. If the helper is
unavailable (e.g., offline on a non-ARIS machine), log the gap and let
/research-wiki sync --arxiv-ids … backfill later.
For non-arXiv sources (Semantic Scholar only, IEEE/ACM journals without arXiv mirrors, blog posts), pass manual metadata instead:
python3 tools/research_wiki.py ingest_paper research-wiki/ \
--title "<full title>" --authors "A, B, C" --year <yyyy> \
--venue "<venue>" [--external-id-doi "<doi>"] [--thesis "..."]
Key Rules
- Always include paper citations (authors, year, venue)
- Distinguish between peer-reviewed and preprints
- Be honest about limitations of each paper
- Note if a paper directly competes with or supports our approach
- Never fail because a MCP server is not configured — always fall back gracefully to the next data source
- Zotero/Obsidian tools may have different names depending on how the user configured the MCP server (e.g.,
mcp__zotero__searchormcp__zotero-mcp__search_items). Try the most common patterns and adapt.
More from wanshuiyin/auto-claude-code-research-in-sleep
idea-creator
Generate and rank research ideas given a broad direction. Use when user says "找idea", "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
128idea-discovery
Workflow 1: Full idea discovery pipeline. Orchestrates research-lit → idea-creator → novelty-check → research-review to go from a broad research direction to validated, pilot-tested ideas. Use when user says \"找idea全流程\", \"idea discovery pipeline\", \"从零开始找方向\", or wants the complete idea exploration workflow.
125auto-review-loop
Autonomous multi-round research review loop. Repeatedly reviews via Codex MCP, implements fixes, and re-reviews until positive assessment or max rounds reached. Use when user says "auto review loop", "review until it passes", or wants autonomous iterative improvement.
118research-pipeline
Full research pipeline: Workflow 1 (idea discovery) → implementation → Workflow 2 (auto review loop) → Workflow 3 (paper writing, optional). Goes from a broad research direction all the way to a polished PDF. Use when user says \"全流程\", \"full pipeline\", \"从找idea到投稿\", \"end-to-end research\", or wants the complete autonomous research lifecycle.
116pixel-art
Generate pixel art SVG illustrations for READMEs, docs, or slides. Use when user says "画像素图", "pixel art", "make an SVG illustration", "README hero image", or wants a cute visual.
116research-review
Get a deep critical review of research from GPT via Codex MCP. Use when user says "review my research", "help me review", "get external review", or wants critical feedback on research ideas, papers, or experimental results.
115