Searching Scientific Literature

Overview

Search PubMed for scientific literature using optimized queries. Extract metadata and prepare papers for relevance evaluation.

Core principle: Cast a wide enough net to find relevant papers, but use targeted keywords to keep results manageable.

When to Use

Use this skill when:

Starting a new research question
User asks "find papers about..."
Need initial paper set for evaluation
Searching for specific methods, compounds, diseases, techniques

Search Strategy

1. Parse User Query

Extract:

Keywords: Main concepts (e.g., "BTK inhibitor", "selectivity", "kinase")
Data types: What user needs (IC50 values, methods, structures, results)
Constraints: Date ranges, specific journals, author names
Synonyms: Alternative terms (e.g., "Bruton's tyrosine kinase" = "BTK")

2. Construct PubMed Query

Boolean operators:

AND - narrow results (must have both terms)
OR - broaden results (either term)
NOT - exclude terms

Example queries:

"BTK inhibitor"[Title/Abstract] AND selectivity[Title/Abstract]

("kinase inhibitor" OR "protein kinase") AND (selectivity OR "off-target")

"ibrutinib"[Title/Abstract] AND ("IC50" OR "inhibitory concentration")

Field tags:

[Title/Abstract] - search title and abstract only
[Title] - title only (more precise)
[Author] - specific author
[Journal] - specific journal
[Date] - date range

3. Execute Search

API endpoint:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?\
db=pubmed&\
term=YOUR_QUERY&\
retmax=100&\
retmode=json&\
sort=relevance

Parameters:

db=pubmed - search PubMed database
term= - your query (URL encode spaces and special chars)
retmax=100 - max results (start with 100)
retmode=json - return JSON
sort=relevance - most relevant first (or pub_date for newest)

Example bash:

curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=BTK+inhibitor+selectivity&retmax=100&retmode=json&sort=relevance"

Response format:

{
  "esearchresult": {
    "count": "156",
    "retmax": "100",
    "idlist": ["12345678", "87654321", ...]
  }
}

4. Fetch Paper Metadata

API endpoint:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?\
db=pubmed&\
id=12345678,87654321&\
retmode=json

Extract from response:

Title
Authors (list)
Journal name
Publication date
Abstract (via separate efetch call or use esummary)
PMID
DOI (if available in articleids)

Getting DOI from PMID:

"articleids": [
  {"idtype": "pubmed", "value": "12345678"},
  {"idtype": "doi", "value": "10.1234/example.2023"}
]

If DOI missing:

Use PMID as fallback identifier
Try to resolve DOI via PubMed Central or publisher APIs later

Output Format

Create list of paper objects:

[
  {
    "pmid": "12345678",
    "doi": "10.1234/example.2023",
    "title": "Selective BTK inhibitors for autoimmune diseases",
    "authors": ["Smith J", "Doe A", "Johnson B"],
    "journal": "Nature Chemical Biology",
    "year": "2023",
    "abstract": "We developed a series of...",
    "source": "pubmed_search"
  }
]

Error Handling

Rate limits (CRITICAL - shared across all processes/subagents):

No API key: 3 requests/second (official limit)
With API key: 10 requests/second
Single agent/script: Use 500ms delays (2 req/sec, safe margin)
- 350ms is theoretically sufficient but causes ~20% HTTP 429 errors in practice
Multiple parallel subagents: Use longer delays to share capacity
- 2 parallel: 1 second each (2 total req/sec)
- 3 parallel: 1.5 seconds each (2 total req/sec)
- 5 parallel: 2.5 seconds each (2 total req/sec)
- Formula: delay_seconds = (num_parallel / rate_limit) + safety_margin
If you get HTTP 429 errors: Wait 5 seconds, resume with doubled delays

Empty results:

Try broader terms
Remove field tags
Check for typos
Use OR to add synonyms

Too many results (>500):

Add more specific terms
Use field tags to narrow
Add date constraints
Consider splitting into sub-queries

Integration with Other Skills

After search completes:

Save results to research folder as initial-search-results.json
For each paper, call evaluating-paper-relevance skill
Track in papers-reviewed.json (use DOI as key, fallback to PMID)

Quick Reference

Task	Command
Search PubMed	`curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=QUERY&retmax=100&retmode=json"`
Get metadata	`curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=PMID1,PMID2&retmode=json"`
URL encode query	Replace spaces with `+`, special chars with `%XX`
Narrow results	Use AND, add field tags, more specific terms
Broaden results	Use OR, remove field tags, add synonyms

Common Mistakes

Too narrow: Only 5 results → Use OR, remove constraints Too broad: 5000 results → Add AND terms, use field tags Missing abstracts: Use efetch instead of esummary for full abstract text DOI not found: Many older papers lack DOI - use PMID as fallback Rate limiting: Add 500ms delays (single agent) or longer (parallel subagents sharing rate limit)

Next Steps

After completing search:

Announce: "Found N papers matching query"
Begin evaluation using skills/research/evaluating-paper-relevance
Update user with progress as papers are screened