polars-dovmed
polars-dovmed
Search across 2.4M+ PMC Open Access papers for literature discovery and extraction tasks.
Instructions
- Load the API key from a secure location.
- Pick the endpoint based on query shape:
- Use
/api/search_literaturefor one concept or a flat synonym query. - Use
/api/scan_literature_advancedfor multi-concept biology searches such as host + symbiont, organism + pathway, or organism + recent-year questions. - Use
/api/get_paper_detailsonce you have candidate PMC IDs.
- Use
- Start with exact taxa or core terms, then broaden with explicit
ORsynonyms. - Inspect the first 5-10 returned titles before trusting the result set.
- If the user needs a specific recent year or complete citation fields, verify missing metadata in PubMed or PMC before finalizing the answer.
Quick Reference
| Task | Action |
|---|---|
| API key | POLARS_DOVMED_API_KEY in env or ~/.config/polars-dovmed/.env |
| Base URL | https://api.newlineages.com |
| Search endpoint | POST /api/search_literature |
| Paper details endpoint | POST /api/get_paper_details |
| Advanced endpoint | POST /api/scan_literature_advanced |
| Helper script | scripts/query_literature.py for grouped scans and compact output |
| Rate limit | 100 queries/hour |
Input Requirements
- API key (
POLARS_DOVMED_API_KEY) - Search query or grouped concepts
- Optional local year target for post-filtering and verification
- Optional
fast_mode:false(default): full-text search (title,abstract_text,abstract,full_text)true: abstract-only search (abstract_text,abstract)
- Do not assume endpoint-level year or journal filters exist unless you have verified the current API syntax.
- If the user asks for a specific year, treat it as a post-filtering and citation-verification step.
Search Semantics
- Plain multi-word queries are
ANDqueries."mirusvirus host"means papers must match bothmirusvirusandhost.- This is often too strict for natural-language searches.
ORmust be written explicitly to broaden synonyms or alternate taxonomy names."Mirusviricota OR mirusvirus""bacteriophage OR phage""host range OR host specificity"
- Mixed queries are grouped by
OR."CRISPR OR Cas9 archaea"means(CRISPR AND archaea) OR (Cas9 AND archaea).
- Stop words such as
the,a,an,in,of,for,and,to,fromare removed. - Single high-frequency terms can be noisy.
- A query like
"CRISPR"may return low-precision results.
- A query like
- Do not assume the endpoint understands natural-language intent.
- Bad:
"papers on hosts of mirusviricota" - Better:
"Mirusviricota OR mirusvirus" - Then fetch details for the returned papers and inspect host text directly.
- Bad:
Query Workflow
- Start with the narrowest exact concept that should exist in the literature.
- Example:
"Holospora OR Caedibacter"
- Example:
- Broaden with explicit synonym queries when taxonomy or naming drift is likely.
- Example:
"Mirusviricota OR mirusvirus OR giant virus"
- Example:
- If the request combines multiple biological concepts, switch early to
/api/scan_literature_advanced.- Use grouped queries for cases like host + symbiont, organism + pathway, or organism + time window.
- Use
/api/get_paper_detailson candidate PMC IDs to inspect abstract/full text and collect citation metadata. - If metadata fields such as
year,publication_date, ordoiare blank, verify the citation in PubMed or PMC before final output. - If a recent paper is missing from the API but exists in PubMed or PMC, report the gap rather than assuming the literature is absent.
Synonym Query Examples
"Mirusviricota OR mirusvirus""bacteriophage OR phage""Nucleocytoviricota OR giant virus""host range OR host specificity""protist OR microeukaryote"
Recent-Paper And Metadata Caveats
- Recent papers may be partially indexed or missing entirely from this API even when they are already in PubMed or PMC.
get_paper_detailscan return blankyear,publication_date,doi, orauthorsfields for valid papers.- When the user asks for "2025 papers", "latest papers", or complete citations:
- treat API results as discovery, not final authority
- verify dates and DOI in PubMed or PMC if metadata is incomplete
- say explicitly when the API appears to have indexing gaps
Output
- Paper lists with metadata (PMC ID, DOI, title, year)
- Matched text snippets
- Extracted entities (genes, accessions, terms)
- Mode metadata (
search_mode,searched_columns) - Warnings about missing metadata or likely indexing gaps when relevant
Quality Gates
- API key loaded successfully
- Chosen endpoint matches the query shape
- First 5-10 returned titles match expected scope
- Extracted citation fields checked for missing
year,publication_date,doi, orauthors - Recent-year requests verified in PubMed or PMC if API metadata is incomplete
- Final answer calls out indexing gaps or low-confidence matches when present
Examples
Example 1: Flat synonym search (Python)
import httpx
headers = {"X-API-Key": "YOUR_KEY"}
resp = httpx.post(
"https://api.newlineages.com/api/search_literature",
headers=headers,
json={
"query": "Mirusviricota OR mirusvirus OR giant virus",
"max_results": 10,
"extract_matches": False,
"fast_mode": False,
},
)
print(resp.json())
Example 2: Grouped host-microbe scan with the helper script
Use the bundled script when the query has multiple concept groups and you want compact output.
POLARS_DOVMED_API_KEY=YOUR_KEY \
python skills/polars-dovmed/scripts/query_literature.py \
--group host_terms=ciliate,ciliates,Ciliophora,Paramecium,Loxodes \
--group symbiont_terms=endosymbiont,symbionts,intracellular,Holosporales,Rickettsiales,Legionellales \
--max-results 25
Example 3: Advanced structured scan (Python)
import httpx
headers = {"X-API-Key": "YOUR_KEY"}
resp = httpx.post(
"https://api.newlineages.com/api/scan_literature_advanced",
headers=headers,
json={
"primary_queries": {
"host_terms": [["ciliate"], ["ciliates"], ["Ciliophora"], ["Paramecium"], ["Loxodes"]],
"symbiont_terms": [["endosymbiont"], ["symbionts"], ["intracellular"], ["Holosporales"], ["Rickettsiales"], ["Legionellales"]],
},
"search_columns": ["title", "abstract_text", "full_text"],
"extract_matches": "primary",
"add_group_counts": "primary",
"max_results": 25,
},
timeout=600.0,
)
print(resp.json())
Use this endpoint when:
- you need more than one concept group
- the flat search endpoint returns
0or low-precision results - the user is asking about host-associated bacteria, symbionts, pathogens, or recent organism-specific literature
Example 4: Fetch details for candidate papers
Use /api/get_paper_details after a broad search has identified relevant PMC IDs.
import httpx
headers = {"X-API-Key": "YOUR_KEY"}
resp = httpx.post(
"https://api.newlineages.com/api/get_paper_details",
headers=headers,
json={"pmc_ids": ["PMC10827195", "PMC11465185"]},
timeout=120.0,
)
print(resp.json())
Use this endpoint when:
- you already have candidate PMC IDs
- you need abstract/full-text fields for manual review
- you want DOI, journal, and publication metadata without rerunning broader searches
Timeout And Relevance Guidance
- Simple
/api/search_literaturequeries usually fit in60-120s. - Broad or high-frequency terms may take longer; use
120sunless you know the query is tiny. /api/scan_literature_advancedcan take minutes; use600sfor non-trivial scans.- For multi-concept biology searches, prefer the advanced endpoint instead of repeatedly retrying brittle flat queries.
- If the API is healthy but results look wrong:
- check whether you accidentally overconstrained the query with implicit
AND - retry with
ORacross synonyms or taxonomy variants - switch to grouped
primary_queriesinstead of adding more words to a flat query - inspect candidate titles before trusting the set
- treat
0hits from a long natural-language phrase as a query-construction problem before assuming the literature is absent - treat missing recent papers as a possible indexing problem and verify externally
- check whether you accidentally overconstrained the query with implicit
Troubleshooting
Issue: 401 Unauthorized
Solution: Verify POLARS_DOVMED_API_KEY and reload the environment.
Issue: 429 Rate limited Solution: Wait for quota reset or reduce request frequency.
Issue: Fast mode returns too few results
Solution: Retry with fast_mode=false to include full text.
Issue: Multi-word query returns 0 but related papers should exist
Solution: Remember plain space-separated terms are AND. Retry with explicit OR synonyms, then switch to /api/scan_literature_advanced for grouped concepts.
Issue: Need host/pathway/accession extraction across several concept groups
Solution: Use /api/scan_literature_advanced instead of forcing everything into one query string.
Issue: Results look topically wrong even though the API returned hits Solution: Check the first 5-10 titles, reduce single-term queries, and tighten the concept groups before extracting details.
Issue: A recent paper appears in PubMed or PMC but not in this API Solution: Treat this as an indexing gap, cite the external source, and mention that the API missed a relevant paper.
Issue: get_paper_details returns blank year, publication_date, or doi
Solution: Verify the citation in PubMed or PMC before giving a year-specific or publication-ready answer.