biorxiv-database

SKILL.md

bioRxiv Database

A Python toolkit for programmatic access to bioRxiv preprints. Supports comprehensive metadata retrieval with structured JSON output for integration into research workflows.

Use Cases

  • Query recent preprints by topic or research domain
  • Monitor publications from specific researchers
  • Perform systematic literature reviews
  • Analyze publication trends across time periods
  • Retrieve citation metadata and DOIs
  • Download preprint PDFs for text analysis
  • Filter results by subject category

Quick Start

# Install dependencies
pip install requests

# Search by keywords
python scripts/biorxiv_client.py --terms "protein folding" --recent 30 --out results.json

# Search by author
python scripts/biorxiv_client.py --author "Chen" --recent 180

# Get specific paper by DOI
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321"

# Download PDF
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321" --fetch-pdf paper.pdf

Command-Line Options

Option Description
-t, --terms Search keywords (multiple allowed)
-a, --author Author name to search
--doi Specific DOI to retrieve
--since Start date (YYYY-MM-DD)
--until End date (YYYY-MM-DD)
--recent Search last N days
-s, --subject Subject category filter
--fields Fields to search: title, abstract, authors
-o, --out Output file (default: stdout)
--max Maximum results to return
--fetch-pdf Download PDF (requires --doi)
-v, --verbose Enable debug output

Programmatic API

from scripts.biorxiv_client import PreprintClient

client = PreprintClient(debug=True)

# Search by keywords
results = client.find_by_terms(
    terms=["enzyme engineering"],
    since="2024-01-01",
    until="2024-12-31",
    subject="biochemistry"
)

# Search by author
papers = client.find_by_author(name="Garcia", since="2023-01-01")

# Get paper by DOI
metadata = client.get_by_doi("10.1101/2024.05.22.594321")

# Download PDF
client.fetch_pdf(doi="10.1101/2024.05.22.594321", destination="paper.pdf")

# Normalize output
formatted = client.normalize(metadata, include_abstract=True)

Subject Categories

Category Category
animal-behavior-and-cognition molecular-biology
biochemistry neuroscience
bioengineering paleontology
bioinformatics pathology
biophysics pharmacology-and-toxicology
cancer-biology physiology
cell-biology plant-biology
clinical-trials scientific-communication-and-education
developmental-biology synthetic-biology
ecology systems-biology
epidemiology zoology
evolutionary-biology
genetics
genomics
immunology
microbiology

Response Structure

{
  "query": {
    "terms": ["protein folding"],
    "since": "2024-03-01",
    "until": "2024-09-30",
    "subject": "biophysics"
  },
  "count": 87,
  "papers": [
    {
      "doi": "10.1101/2024.05.22.594321",
      "title": "Example Preprint Title",
      "authors": "Chen L, Patel R, Kim S",
      "corresponding_author": "Chen L",
      "institution": "Research Institute",
      "posted": "2024-05-22",
      "revision": "1",
      "category": "biophysics",
      "license": "cc_by",
      "paper_type": "new results",
      "abstract": "Abstract content here...",
      "pdf_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1.full.pdf",
      "web_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1",
      "journal_ref": ""
    }
  ]
}

Best Practices

Recommendation Details
Date ranges Narrow ranges improve response time. Split large queries into chunks.
Category filters Use --subject to reduce bandwidth and improve precision.
Rate limiting Built-in 0.5s delay between requests. Add more for bulk operations.
Result caching Save JSON outputs to avoid redundant API calls.
Version awareness Preprints may have multiple versions. PDF URLs encode version numbers.
Error checking Verify count in outputs. Zero results may indicate date or connectivity issues.
Debug mode Use --verbose for detailed request/response logging.

Reference Files

File Contents
api-reference.md Complete bioRxiv REST API documentation
examples.md Extended code examples and workflow patterns
Weekly Installs
26
First Seen
Feb 25, 2026
Installed on
mcpjam26
claude-code26
replit26
junie26
windsurf26
zencoder26