biorxiv-search
bioRxiv Search
Search bioRxiv through its official API for recent-preprint discovery, date-range scans, DOI lookups, author shortlisting, and local keyword filtering over title, abstract, and author metadata.
Instructions
- Prefer this skill when the request is about bioRxiv-native preprints, recent biology submissions, or preprint metadata that may lag in PubMed, PMC, or Crossref.
- Use the bundled CLI:
- In this repository:
skills/biorxiv-search/scripts/search - After installation:
~/.agents/skills/biorxiv-search/scripts/search
- In this repository:
- The official bioRxiv API does not provide a general server-side keyword search endpoint.
- Use the CLI to fetch metadata from a bounded recent window or explicit date range, then filter locally.
- When keywords are provided, search
title,abstract, andauthorsby default.- If the user wants abstract-only matching, pass
--fields abstract.
- If the user wants abstract-only matching, pass
- Keep the search window bounded.
- Use
--days Nfor recent scans or--start-date YYYY-MM-DD --end-date YYYY-MM-DDfor explicit intervals. - If you omit the interval, the CLI defaults to the most recent 30 days.
- The CLI converts
--days Ninto an explicit date range before calling the API so pagination stays predictable.
- Use
- Use
--category <name>when the topic should stay narrow.- The API accepts the bioRxiv category as a query parameter such as
cell_biology,genomics, orneuroscience.
- The API accepts the bioRxiv category as a query parameter such as
- Use
--authorfor author-specific requests.- By default, consider both the supplied full-name form and an abbreviated-first-name form, for example
--author "Peter Nugent"and--author "P. Nugent". - Do not silently merge these in the final answer. Report full-name matches and abbreviated-first-name matches in separate groups because initials can be ambiguous.
- The CLI also expands obvious first-initial variants from the supplied author string, so prefer separate passes or a local partition of returned records by the literal
authorstext when you need clean buckets.
- By default, consider both the supplied full-name form and an abbreviated-first-name form, for example
- The API paginates 100 records at a time.
- Increase
--scan-limitwhen the query is broad and the first pages do not contain enough matches.
- Increase
- By default, the CLI collapses multiple versions of the same preprint and keeps the latest version for each DOI.
- Use
--all-versionsonly when version-by-version output matters.
- Use
- Treat the API output as discovery metadata.
- If exact citation details or the latest abstract-page presentation matter, verify the shortlisted candidates on bioRxiv or the DOI landing page before finalizing the answer.
- If the user wants peer-reviewed biomedical literature or PMC full text rather than bioRxiv preprints, use
polars-dovmedinstead.
Quick Reference
| Task | Action |
|---|---|
| Search script | skills/biorxiv-search/scripts/search |
| Base API | https://api.biorxiv.org/details/biorxiv/... |
| Default search fields | title,abstract,authors |
| Recent window | --days 30 |
| Date range | --start-date YYYY-MM-DD --end-date YYYY-MM-DD |
| DOI lookup | --doi 10.1101/... |
| Category filter | --category cell_biology |
| Author filter | --author "Name" |
| Author variant workflow | Check full-name and abbreviated-first-name variants separately; report them separately |
| Abstract-only filtering | --fields abstract |
| Deduping | latest version per DOI by default |
| Keep all versions | --all-versions |
| Network timeout | --timeout 30 |
| Help | skills/biorxiv-search/scripts/search --help |
Input Requirements
- Python 3
- One of:
- a keyword query
- a bioRxiv DOI via
--doi - a request for recent/date-bounded preprints with no keyword query
- Optional interval controls:
--days <N>for the most recentNdays--start-date YYYY-MM-DD --end-date YYYY-MM-DDfor an explicit date range
- Optional filters:
--category <name>for a bioRxiv subject category--author <name>repeated for author substrings or name variants--fields title,abstract,authorsto restrict local keyword matching--phraseto treat the whole query as one phrase instead of splitting on spaces- explicit
ORin the query for broader local matching --scan-limit <N>for how many API records to inspect locally--all-versionsto keep multiple versions of the same DOI
- If the user asks for very old or very broad searches, widen the date range deliberately and be explicit that recall depends on the chosen interval and
--scan-limit.
Search Semantics
- The official bioRxiv API supports:
- recent-post windows such as
30d - explicit date ranges
- DOI lookup
- subject-category filtering
- recent-post windows such as
- The API does not support a general server-side keyword query for title or abstract.
- The CLI performs local filtering after fetching metadata.
- For predictable paging, the CLI implements
--days Nas an explicit UTC date range instead of relying on the API's relative-date shorthand. - Plain multi-word queries are local
ANDqueries.single cell atlasmeans all three terms must appear somewhere in the selected search fields.
ORmust be written explicitly to broaden synonyms or alternate phrasings."organoid OR spheroid""CRISPR OR Cas9"
- Quoted phrases are preserved when possible.
"\"single cell\" atlas"keepssingle cellas one phrase and also requiresatlas.
--fields abstractrestricts keyword filtering to abstracts only.- This is the flag to use when the user explicitly cares about abstract matches.
- Author filters can fragment across name variants.
- For person-specific searches, check the full-name form and abbreviated-first-name form separately and keep those buckets separate in the final answer.
Output
- JSON with:
- request metadata (
query,query_groups, interval, category, author filters, search fields) - API metadata (
pages_fetched,records_scanned,total_available,request_urls) - warnings about defaulted windows, scan-limit truncation, or API limitations
- normalized result records with:
doititleauthorsdateversioncategoryabstractpublisheddoi_urlbiorxiv_urlmatched_in
- request metadata (
Quality Gates
- The request uses a bounded recent window or explicit date range
- The chosen
--scan-limitis large enough for the query breadth - The selected search fields match the user request, especially when abstract matching matters
- Author-specific requests use one or more reasonable name variants
- The final answer keeps abbreviated-name matches separate and labels them as potentially ambiguous
- The answer does not overstate recall for a broad historical search
- Final candidate metadata is verified on bioRxiv when exact citation/version details matter
Examples
Example 1: Recent keyword scan over title + abstract
skills/biorxiv-search/scripts/search "single cell atlas" 10 --days 30
Example 2: Broaden with OR
skills/biorxiv-search/scripts/search '"organoid OR spheroid"' 15 \
--days 90 \
--category developmental_biology
Example 3: Abstract-only keyword filtering
skills/biorxiv-search/scripts/search "CRISPR screen" 10 \
--days 60 \
--fields abstract
Example 4: Author-specific search with separate variant reporting
skills/biorxiv-search/scripts/search "supernova" 20 \
--days 365 \
--author "Peter Nugent" \
--author "P. Nugent"
Example 5: DOI lookup
skills/biorxiv-search/scripts/search --doi 10.1101/682021
Troubleshooting
Issue: Results are too broad
Solution: Narrow the interval, add --category, restrict with --fields, or replace a loose query with a phrase or explicit OR terms.
Issue: Results are too sparse
Solution: Increase --days or widen the date range, raise --scan-limit, and add alternate query terms with explicit OR.
Issue: Need abstract matches, not title matches
Solution: Use --fields abstract.
Issue: Author search looks incomplete
Solution: Repeat --author with explicit variants such as "Peter Nugent" and "P. Nugent". If a middle initial is known, add that too, for example "Peter E. Nugent" and "P. E. Nugent". Keep these result sets separate in the final answer because abbreviated forms can be ambiguous.
Issue: The API returns multiple versions of the same preprint
Solution: Keep the default deduped output, or pass --all-versions if version-level output matters.
Issue: Broad historical search may be missing expected hits
Solution: This usually means the interval or --scan-limit was too narrow. Widen them deliberately and say so in the final answer.
Issue: Need peer-reviewed literature rather than preprints
Solution: Use polars-dovmed or another peer-reviewed-literature workflow instead of bioRxiv metadata search.
More from fmschulz/omics-skills
bio-phylogenomics
Build marker gene alignments and phylogenetic trees.
19bio-protein-clustering-pangenome
Cluster proteins into orthogroups and derive pangenome matrices.
18plotly-dashboard-skill
Build production-ready Plotly Dash dashboards with consistent theming, clear layouts, and performant callbacks.
17bio-foundation-housekeeping
Initialize a bioinformatics project scaffold with reproducible environments, schemas, and data cataloging. Use for new projects or repo setup.
16bio-stats-ml-reporting
Aggregate results, train ML models, and produce reports with validated references.
16bio-logic
Evaluate scientific rigor, methods, biases, and evidence quality for claims, papers, and study designs.
16