paper-navigator

Installation

SKILL.md

Paper Navigator

Find and read academic papers in four stages:

┌──────────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ Disambiguate │ →  │ Discover │ →  │ Evaluate │ →  │   Read   │
└──────────────┘    └──────────┘    └──────────┘    └──────────┘
                                                         ↓
                                              ┌──────────────────┐
                                              │ research-survey  │ (for survey reports)
                                              │ research-ideation│ (for idea generation)
                                              └──────────────────┘

Setup: Scripts are in skills/paper-navigator/scripts/. Run via python skills/paper-navigator/scripts/<name>.py. Optional env vars for higher rate limits: S2_API_KEY (Semantic Scholar), JINA_API_KEY (Jina Reader), GITHUB_TOKEN, HF_TOKEN.

Semantic Scholar Key Gate (MANDATORY)

Before using any Semantic Scholar-dependent script, check whether S2_API_KEY is set.

If S2_API_KEY is set: you may use scholar_search, citation_traverse, recommend, author_search, trending, and other S2-backed tools normally.
If S2_API_KEY is missing: do not use Semantic Scholar. Tell the user that Semantic Scholar is unavailable without a key and ask whether they want to provide one. If they do not provide a key, continue with non-S2 sources only: arxiv_monitor, web search, GitHub search, Hugging Face search, or direct paper URLs/DOIs/arXiv IDs.
Without S2_API_KEY, skip citation-graph expansion (citation_traverse, recommend) entirely instead of retrying or waiting.

Step 0: Search Strategy Principles (MANDATORY)

Every discovery task MUST follow these principles before executing any workflow.

Query Reformulation

Before searching, decompose the user's topic and generate 4-6 variant queries covering distinct research angles. This is critical because different papers use different terminology for the same concept, and a single research topic often spans multiple sub-communities.

Step 1: Sub-topic decomposition. Identify 3-5 distinct research angles within the user's query. Most research topics span multiple perspectives:

Empirical vs. theoretical — papers that observe/measure the phenomenon vs. papers that prove/explain it formally
Mechanism vs. condition — papers about how something works vs. when/why it emerges
Method keywords — different communities use different terms for the same concept (e.g., "gradient descent" vs. "meta-optimization" vs. "implicit learning")
Adjacent formulations — the same idea framed differently (e.g., "in-context learning" vs. "few-shot learning" vs. "learning from demonstrations")

Step 2: Generate queries. Create at least one query per identified angle, using synonym substitution, specificity adjustment, and structural variants:

Synonym substitution: "data pruning" → "data selection", "data filtering", "data curation"
Specificity adjustment: broaden ("pretraining data quality") or narrow ("perplexity-based data pruning LLM")
Structural variants: swap word order, add/remove qualifiers, use abbreviations

Example: User asks "how LLMs gain in-context learning during pretraining"

Angles: (a) mechanistic/circuit, (b) training dynamics, (c) ICL-as-optimization theory, (d) data/task conditions, (e) formal theory
Query 1: "in-context learning emergence pretraining language model" (general)
Query 2: "induction heads formation training transformer" (mechanistic)
Query 3: "transformers learn in-context gradient descent meta-learning" (optimization view)
Query 4: "pretraining task diversity data structure in-context learning" (data conditions)
Query 5: "in-context learning theory linear attention generalization" (formal theory)

Example: User asks "papers about data pruning for LLM pretraining"

Angles: (a) selection methods, (b) quality metrics, (c) scaling effects
Query 1: "data pruning pretraining language model"
Query 2: "data selection pretraining LLM"
Query 3: "training data curation large language model quality"
Query 4: "data quality scoring pretraining scaling"

Multi-Source Parallel Search

Never rely on a single search source. For every discovery task, run at least 2 sources:

Primary (with S2_API_KEY): scholar_search (S2 with automatic arXiv fallback on rate limit)
Primary (without S2_API_KEY): arxiv_monitor --keywords "<variants>" --match-mode flexible
Secondary: web search or GitHub search for recent blog posts, surveys, repos, and paper links
Tertiary: additional non-S2 sources such as direct arXiv/DOI URLs or Hugging Face dataset/model search when relevant

CRITICAL — S2 usage rule:

With S2_API_KEY set: You MAY use S2-backed scripts. Prefer moderate fan-out and keep citation expansion scoped to the user's actual need.
Without S2_API_KEY: Do not invoke S2-backed scripts at all. Do not “try once anyway”, do not queue retries, and do not run citation_traverse / recommend.
How to check: Before starting discovery, run echo $S2_API_KEY or check if the env var is set. If empty, tell the user Semantic Scholar is unavailable without a key and continue with non-S2 sources.
arXiv-only scripts (arxiv_monitor) are NOT affected by this rule and can always run in parallel with other non-S2 calls.

Rate-Limit-Aware Fallback Chain

When Semantic Scholar returns 429 or empty results:

scholar_search automatically falls back to arXiv (built-in since v1.2)
Use arxiv_monitor --keywords with --match-mode flexible for broader coverage
Switch to web search for blog posts, surveys, GitHub repos that reference papers
Space S2-dependent calls (citation_traverse, recommend) at least 5s apart and reduce --limit

Prevention is better than fallback: The arXiv fallback produces lower-quality results (no citation counts, less precise relevance ranking). If S2_API_KEY is missing, skip S2 entirely and use the non-S2 chain from the start.

Mandatory Citation Expansion (for multi-paper discovery tasks)

After finding ≥3 relevant seed papers, you MUST expand coverage using the citation graph only when S2_API_KEY is available. The goal is to discover papers that keyword search cannot reach.

Seed selection: Rank all found relevant papers by citation count. Pick the top 3 as primary seeds.

Expansion steps (all mandatory):

Co-citation on the single highest-cited seed: citation_traverse --direction co-citation --limit 15 — this is the strongest signal for finding closely related work that uses different terminology
Forward citations on the top 2 seeds: citation_traverse --direction forward --limit 20 — finds follow-up work
Backward citations on 1-2 seeds whose topic coverage differs: citation_traverse --direction backward --limit 20 — finds foundational and adjacent work that seeds build on. Pick seeds from different sub-topics to maximize coverage breadth
Recommendations with diverse seeds: recommend --positive <seed1>,<seed2>,<seed3> — serendipitous discovery of semantically related work not connected by citations

Seed diversity principle: When selecting seeds for backward traversal or recommendations, prefer seeds from different sub-topics identified in query reformulation. This prevents the citation graph from staying within a single research community.

Applies to: WF1 (Survey), WF3 (Quick Search with >10 results), WF5 (Track Developments), WF9 (Ideation), WF10 (User-specified count), and only when S2_API_KEY is configured. Does NOT apply to: WF2 (Find specific paper), WF7 (Read paper by URL).

Coverage Gap Check (for multi-paper discovery tasks)

After initial search + citation expansion, review the collected papers against the sub-topics identified during query reformulation.

For each sub-topic angle:

Count how many collected papers address it
If a sub-topic has 0-1 papers, run a targeted scholar_search with a query specific to that angle only when S2_API_KEY is available. Otherwise use arxiv_monitor, web search, or GitHub search for that angle.
If targeted search finds new relevant papers and S2_API_KEY is available, optionally run one more citation_traverse or recommend round on the new finds

This step catches systematic blind spots where an entire research perspective was missed by all prior queries. It is lightweight — typically 1-2 additional searches for gaps, not a full re-search.

Applies to: Same workflows as Mandatory Citation Expansion.

Step 1: Classify Intent and Select Workflow

Start here. Determine what the user wants and route to the right workflow. Match complexity to intent — simple queries get simple answers.

Intent	Signal	Workflow	Complexity
Find a specific paper	Title, author name, or URL	WF 2	Single search call
Quick paper search	"give me papers about X", "find papers on X"	WF 3	Single search call
Metadata search	Author + year, venue filter	WF 4	Single search + filter
Track recent advances	"latest", "recent", "what's new"	WF 5	1-2 calls
Find a baseline	Code, SOTA, implementation	WF 6	Search + code check
Read a paper	URL or "read this paper"	WF 7	Fetch + read
Ambiguous term	Project name, module name, nickname	WF 8	Web search + resolve
Literature survey	"survey X", comprehensive coverage	WF 1 → then hand off to `research-survey`	Iterative collection
Related work map	Connections between papers	WF 1	Citation traversal
Ideation support	Called from research-ideation	WF 9	Iterative + strict filter
User-specified count	"find me exactly N papers about X"	WF 10	Adaptive

Key principle: Simple "find me papers about X" queries should return results from a single search call, not trigger the full iterative collection workflow. Only use iterative expansion for comprehensive surveys or ideation support.

Step 2: Resolve Ambiguous Terms (if needed)

When the user's query might be a colloquial name, project name, or module name (rather than a paper title):

Quick academic search — Try scholar_search with the exact query
If zero results — Broaden the search:
- Web search: Find GitHub repos, blog posts, or social media that reveal the actual paper title or arXiv ID
- GitHub search: github_search.py --query "USER_QUERY" — repos often link to papers
Extract identifiers — Actual paper title, arXiv ID, GitHub repo URL, author names
Re-enter the appropriate workflow with resolved identifiers

Example disambiguation report:

🔍 Disambiguation Report for "deepseek engram"
├── Intent: Track recent advances (ambiguous term)
├── Resolution: "Engram" is a module name from DeepSeek AI
│   ├── Actual paper: "Conditional Memory via Scalable Lookup" (ArXiv:2601.07372)
│   └── GitHub: https://github.com/deepseek-ai/Engram
└── Search Plan:
    ├── scholar_search --query "Conditional Memory Scalable Lookup" --sort-by year
    ├── citation_traverse --paper-id ArXiv:2601.07372 --direction forward
    └── github_search --query "deepseek engram"

Standard Output Formats

Use these formats when presenting results to the user. Match the format to the intent.

Format A: Single Paper Card (for navigational search, WF 2)

📄 **Highly accurate protein structure prediction with AlphaFold**
Authors: Jumper et al.
Year: 2021 | Venue: Nature
Citations: 25,000+
DOI: 10.1038/s41586-021-03819-2 | S2 ID: 235959867
Link: https://doi.org/10.1038/s41586-021-03819-2
TLDR: End-to-end neural network for protein structure prediction achieving atomic accuracy...

Format B: Paper List Table (for quick search, metadata search, trending — WF 3/4/5)

| # | Title | Authors | Year | Venue | Citations | ID |
|---|-------|---------|------|-------|-----------|-----|
| 1 | Paper Title | First Author et al. | 2024 | NeurIPS | 150 | arXiv:2401.xxxxx |
| 2 | ... | ... | ... | ... | ... | ... |

After the table, briefly note how many results were found and whether the list was filtered.

Format C: Baseline Recommendation (for baseline hunt, WF 6)

📦 **Recommended Baseline: [Model Name]**
Paper: [Title] ([Year], [Venue]) — [arXiv ID]
Code: [GitHub URL] ⭐ [stars] | Framework: [PyTorch/TF]
Performance: [key metric = value] on [dataset]
HuggingFace: [model page URL] | Downloads: [N]

Format D: Reading Notes (for read a paper, WF 7)

Use the template at assets/paper-summary-template.md. Save to /artifacts/paper-notes/{paper-id}.md.

Format E: Disambiguation Report (for ambiguous queries, WF 8)

🔍 Disambiguation Report for "[query]"
├── Intent: [classified intent]
├── Resolution: [what the term actually refers to]
│   ├── Paper: [resolved title] ([arXiv ID])
│   └── Code: [GitHub URL]
└── Search Plan:
    ├── [script call 1]
    └── [script call 2]

Common Workflows

Workflow 1: Collect Papers for Survey

"Help me survey CRISPR-based gene therapy for sickle cell disease"

Use iterative collection (target 30-80 papers). See Appendix A for the full iterative methodology.

Discover: If S2_API_KEY is available, start with scholar_search --query "CRISPR gene therapy sickle cell" --limit 20 --sort-by citations → iterative expansion with EXPLORE/EXPLOIT strategy → citation_traverse --direction forward on seminal papers. If the key is missing, tell the user Semantic Scholar is unavailable without a key, then use arxiv_monitor + web/GitHub search instead and skip citation-graph expansion.
Evaluate: Review each paper's title + abstract for relevance → filter by abstract quality → prefer top-tier venues → shortlist
Read: fetch_paper for key papers → L2 reading → notes using assets/paper-summary-template.md
Hand off to research-survey to synthesize the collected papers into a structured survey report

Workflow 2: Navigational Search

"Find me the attention is all you need paper" "Find me the original GPT 3 paper"

Discover: If S2_API_KEY is available, use scholar_search --query "Attention Is All You Need" — single call, return top result. If the key is missing, ask whether the user wants to provide one; otherwise resolve via arXiv/DOI/web search and continue without S2.
Output: Use Format A (Single Paper Card)

Do NOT proceed to Read unless the user explicitly asks.

Workflow 3: Quick Paper Search

"Give me papers about perovskite solar cell stability under humidity" "Find papers on gut microbiome modulation for autoimmune diseases"

Sub-topic decomposition + query reformulation: Identify 3-5 research angles within the topic, generate 4-6 variant queries covering distinct angles (see Step 0)
Discover: If S2_API_KEY is set, run scholar_search --query "<variant>" --limit 20 --sort-by relevance on each variant. If the key is missing, tell the user Semantic Scholar is unavailable without a key, skip scholar_search, and use arxiv_monitor --keywords "<variants>" --match-mode flexible plus web/GitHub search instead.
Citation expansion (if initial results ≥ 3 relevant papers and S2_API_KEY is available): Follow Mandatory Citation Expansion (Step 0) — co-citation on highest-cited seed, forward on top 2, backward on 1-2 diverse seeds, recommend with 3 seeds
Coverage gap check: Review collected papers against identified sub-topics. Run targeted searches for uncovered angles using S2 only when the key is available; otherwise use non-S2 sources
Filter: Review all results, deduplicate, keep relevant papers based on title + abstract
Output: Use Format B (Paper List Table)

Only escalate to full iterative workflow (WF1) if results are clearly insufficient or the user explicitly asks for more.

Workflow 4: Metadata Search

"2012 papers by David Harel" "Papers by David Harel from 2020 to 2022" "Journal articles by David Harel from 2020 to 2022"

Parse query: Extract author name, year range, venue type (journal/conference)
Discover: author_search --name "David Harel" --papers --limit 50 --sort-by year
Filter: Year range, venue type (check venue field), other attributes
Output: Use Format B (Paper List Table)

For keyword + year filter (no author): scholar_search --query "<keywords>" --year-min YYYY --year-max YYYY

Workflow 5: Track Field Developments

"What's new in condensed matter physics this week?"

Discover: arxiv_monitor --categories cond-mat --days 7 (see references/arxiv-categories.md for codes) + trending --query "topological insulator" --period 30
Output: Use Format B (Paper List Table), highlight high-potential papers with TLDRs

Workflow 6: Find a Baseline with Code

"I need a baseline for protein structure prediction with code"

Discover: scholar_search --query "protein structure prediction" --sort-by citations
Evaluate: find_code on top results + sota --task "protein-structure-prediction" → pick one with official code + high downloads
Output: Use Format C (Baseline Recommendation)

Workflow 7: Read a Paper by URL

"Read this paper: arxiv.org/abs/2301.12345"

Output: Use Format D (Reading Notes)

Fetch: fetch_paper --url "https://arxiv.org/abs/2301.12345"
Choose reading depth (see references/reading-strategy.md):

Level	Goal	When to use	Effort
L1 Technical	Can reimplement	Building directly on this paper	High
L2 Analytical	Understand motivation + design choices	Most papers in a survey	Medium
L3 Contextual	Know what it is and where it fits	Quick scanning	Low

Take notes using assets/paper-summary-template.md. Save to /artifacts/paper-notes/{paper-id}.md.

Workflow 8: Ambiguous Query Resolution

"Find the latest about deepseek engram"

Disambiguate: Follow Step 2 above
Discover: scholar_search with resolved title + github_search with original term + citation_traverse on arXiv ID
Evaluate: Review results, check code via find_code or GitHub
Read: fetch_paper for top papers
If user wants a survey: hand off to research-survey

Workflow 9: Ideation Support (called from research-ideation)

research-ideation Step 2 needs papers to build a literature tree

Iterative collection with strict filter (target 30-50 papers, recent 2020+). See Appendix A and Appendix B.

Disambiguate: Parse the research goal → extract domain + method type
Discover: Initial broad search (60 candidates) → iterative expansion up to 15 rounds:
- EXPLORE: new keyword queries for diverse sub-areas
- EXPLOIT: citation_traverse or recommend on strongly relevant papers
Evaluate: Only keep strongly relevant papers. Prefer top-tier venues + 2020+ papers.
Deduplicate: Track seen titles and abstracts.
Output: 30-50 high-quality papers → feed into novelty tree + challenge-insight tree.

Workflow 10: User-Specified Paper Count

"Find me exactly 15 papers about reinforcement learning from human feedback"

Use the user's number as the target
Apply the closest profile's quality settings
Run iterative collection until target met or max iterations exhausted
If not enough, progressively relax relevance standard and inform the user

Discovery Paths (Stage 1 Detail)

Seven paths, used by workflows above.

Path A: Keyword Search (most common)

python scripts/scholar_search.py --query "transformer attention mechanism" --limit 20 --sort-by citations

Options: --year-min/--year-max, --open-access-only, --sort-by relevance|citations|year.

Path B: Citation Traversal

# Forward — who cited this paper
python scripts/citation_traverse.py --paper-id ArXiv:1706.03762 --direction forward --limit 20

# Backward — what this paper cites
python scripts/citation_traverse.py --paper-id ArXiv:1706.03762 --direction backward --limit 20

# Co-citation — papers frequently cited alongside this one (most powerful for finding related work)
python scripts/citation_traverse.py --paper-id ArXiv:1706.03762 --direction co-citation --limit 15

Path C: Recommendations

python scripts/recommend.py --positive ArXiv:1706.03762,ArXiv:2005.14165 --limit 15
python scripts/recommend.py --positive ArXiv:1706.03762 --negative ArXiv:2301.00001 --limit 10

Path D: Author Tracking

python scripts/author_search.py --name "Geoffrey Hinton" --papers --limit 20 --sort-by citations

Path E: arXiv Monitoring

python scripts/arxiv_monitor.py --categories cs.CL,cs.AI --days 3 --limit 30
python scripts/arxiv_monitor.py --keywords "chain of thought,reasoning" --days 7
python scripts/arxiv_monitor.py --keywords "data pruning pretraining" --match-mode flexible --days 365

Options: --match-mode flexible (default, AND-of-words for better recall) or --match-mode exact (phrase matching for precision). See references/arxiv-categories.md for category codes.

Path F: Trending Detection

python scripts/trending.py --query "large language models" --period 90 --limit 15

Ranks by citation velocity (citations/month).

Path G: GitHub Search

python scripts/github_search.py --query "deepseek engram" --limit 10
python scripts/github_search.py --query "mamba state space model" --sort stars

Useful when papers haven't been published on arXiv yet or industry labs release code before papers.

Citation Graph Visualization

After traversal, visualize with Mermaid (keep ≤30 nodes):

graph TD
    SEED["Attention Is All You Need<br/>2017 · 100k+"]
    A["BERT · 2018"] --> SEED
    B["GPT-2 · 2019"] --> SEED
    C["Vision Transformer · 2020"] --> SEED

Evaluation Tools (Stage 2 Detail)

Quick Assessment (from scholar_search output)

Signal	What it tells you
TLDR	One-sentence understanding
Citation count	Overall impact
Influential citations	Quality of impact
Year + venue	Recency and authority
Open Access PDF	Whether you can read full text

Code Availability

python scripts/find_code.py --arxiv-id 1706.03762

Top Models by Task

python scripts/sota.py --task "text-generation" --limit 10
python scripts/sota.py --task "translation" --list-tasks

Dataset Discovery

python scripts/dataset_search.py --query "sentiment analysis" --limit 10

Reproducibility Assessment

Dimension	Check
Code	Open-source? Official? Stars? Last update?
Results	Reproduced on SOTA leaderboard?
Data	Dataset publicly available?
Overall	High / Medium / Low / None

After Collecting Papers: Next Steps

Goal	Hand off to
Generate a literature survey report	`research-survey` — synthesizes papers into a structured 8-section report
Generate research ideas	`research-ideation` — builds novelty tree + challenge-insight tree from papers
Write a Related Work section	`paper-writing` — uses paper notes as input

Quick Report (optional, stays in paper-navigator)

For a brief summary table without a full survey report, use literature_report.py:

python scripts/literature_report.py --paper-ids ArXiv:2601.07372,ArXiv:2501.12948 --intent quick_scan

Intent	Output
`quick_scan`	Brief table: title, authors, year, citations, TLDR
`baseline_hunt`	Code availability, SOTA position, dataset access, reproducibility

For full survey reports (survey, deep_dive intents), use research-survey instead.

Appendix A: Iterative Collection Workflow

For workflows requiring many papers (survey, ideation support), use iterative expand-and-filter:

1. Parse query → extract goal, search terms, key term definitions
2. Define task attributes → identify domain + method type
3. Initial search → scholar_search with broad query
4. Review each paper's title + abstract → judge relevance (keep/reject)
5. LOOP until target met or max iterations reached:
   a. From kept papers, pick the most relevant as "grounding set"
   b. Generate next search query:
      - EXPLORE: new keyword query to broaden coverage
      - EXPLOIT: citation_traverse or recommend on a high-relevance paper
   c. Fetch new papers → review → deduplicate → add to collection
6. Final filter: apply quality checks, take top N

Relevance judging: You (Claude) evaluate each paper directly from title + abstract against the user's goal. No separate API call needed.

Deduplication: Track seen titles (normalized) and abstract prefixes. Skip already-evaluated papers.

Quality filtering:

Skip papers with very short abstracts (< 20 words)
For ideation/survey: prefer top-tier venues and journals in the user's field (e.g., Nature, Science, Cell, Lancet, PNAS for broad science; field-specific top venues like NeurIPS/ICML for ML, Physical Review Letters for physics, JACS for chemistry, etc.)
For ideation: prefer 2020+ papers; include older only if foundational

Appendix B: Ideation vs Survey Collection

Aspect	Ideation Support	Literature Survey
Goal	Find gaps and transferable techniques	Comprehensive field coverage
Relevance standard	Strict — only strongly relevant	Moderate — include tangentially relevant
Recency	Strong bias toward 2020+	Include foundational older work
Initial search size	60 candidates	20 candidates
Coverage strategy	Deep on core topic + cross-domain	Balanced across sub-topics
Output use	Novelty tree + challenge-insight tree	Comprehensive report

Appendix C: Script & API Reference

All scripts output Markdown to stdout, errors to stderr. Common flags: --limit N, --json.

Paper ID Formats

Scripts accept and normalize automatically: S2 ID, arXiv (ArXiv:1706.03762 or 1706.03762 or URL), DOI (DOI:10.18653/v1/N18-3011).

Rate Limits

API	Without key	With key	When rate limited
Semantic Scholar	Disabled — set `S2_API_KEY` to enable	100 req/min; parallel OK	`scholar_search` auto-falls back to arXiv; `citation_traverse`/`recommend` require the key
arXiv	1 req/3s (courtesy)	N/A	Primary fallback when S2 is limited; no auth needed
Jina Reader	Free tier	Higher with key	—
HuggingFace	500 req / 300s	Higher with `HF_TOKEN`	—
GitHub	10 req/min	5,000 req/hr (set `GITHUB_TOKEN`)	—

All scripts retry on 429 and 5xx errors with exponential backoff (3s, 6s, 12s, 24s, 48s — 5 retries). A global S2 request pacer enforces minimum interval between Semantic Scholar API calls to prevent budget exhaustion.

For detailed API endpoints, query parameters, and field specifications, see references/api-reference.md.

Integration

research-survey: After collecting papers, hand off to research-survey for structured survey report generation (8-section goal-centric synthesis).
research-ideation: After collecting papers, hand off to research-ideation for idea generation (novelty tree + challenge-insight tree + problem selection + solution design).
experiment-pipeline: After finding a baseline via Workflow 6, hand off to experiment-pipeline.
paper-writing: Paper notes serve as input for paper-writing's Related Work section.

Related skills

More from evoscientist/evoskills

Installs

152

Repository

evoscientist/evoskills

GitHub Stars

351

First Seen

Mar 28, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn