base-academic-search
BASE (Bielefeld Academic Search Engine) API
Overview
BASE is one of the world's largest search engines for academic open access web resources. Operated by Bielefeld University Library, it indexes 400M+ documents from 11,000+ content providers including institutional repositories, preprint servers, and digital libraries. Unlike Google Scholar, BASE provides structured metadata, license information, and full-text links. The API is free with registration.
API Endpoints
Base URL
https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi
Search
# Basic keyword search (JSON response)
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=climate+change+adaptation&format=json&hits=20"
# Search with field filters
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=dctitle:transformer+AND+dcsubject:NLP&format=json"
# Filter by document type and year
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=deep+learning&dctypenorm=121&dcyear:2024&format=json"
# Open access only
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=CRISPR&dcrights:open&format=json"
Search Fields
| Field | Description | Example |
|---|---|---|
dctitle |
Title | dctitle:attention+mechanism |
dccreator |
Author | dccreator:vaswani |
dcsubject |
Subject/keywords | dcsubject:machine+learning |
dcdescription |
Abstract | dcdescription:neural+network |
dcyear |
Publication year | dcyear:2024 |
dctype |
Document type text | dctype:article |
dctypenorm |
Normalized type code | 121 (journal article) |
dcrights |
Access rights | dcrights:open |
dclang |
Language | dclang:eng |
dclink |
Source URL | dclink:arxiv.org |
dcoa |
Open access status | dcoa:1 (OA), dcoa:2 (restricted) |
dcprovider |
Content provider | dcprovider:arxiv.org |
Document Type Codes
| Code | Type |
|---|---|
121 |
Journal article |
122 |
Book / monograph |
14 |
Conference paper |
15 |
Thesis / dissertation |
17 |
Report |
18 |
Preprint |
Query Parameters
| Parameter | Description | Default |
|---|---|---|
func |
Must be PerformSearch |
Required |
query |
Search query with optional field prefixes | Required |
format |
Response format: json or xml |
xml |
hits |
Results per page (max 125) | 10 |
offset |
Pagination offset | 0 |
sortby |
Sort: dcyear desc, score desc |
relevance |
Response Structure
{
"response": {
"numFound": 45200,
"start": 0,
"docs": [
{
"dctitle": "Attention Is All You Need",
"dccreator": ["Ashish Vaswani", "Noam Shazeer"],
"dcyear": "2017",
"dcsubject": ["machine learning", "attention mechanism"],
"dcdescription": "The dominant sequence transduction models...",
"dcidentifier": "https://arxiv.org/abs/1706.03762",
"dcsource": "arXiv.org",
"dcprovider": "arxiv.org",
"dcdocid": "abc123xyz",
"dcoa": 1,
"dctypenorm": ["18"],
"dclang": ["eng"]
}
]
}
}
Python Usage
import requests
BASE_URL = "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi"
def search_base(query: str, hits: int = 20,
doc_type: int = None, oa_only: bool = False) -> list:
"""Search BASE for academic open access documents."""
q = query
if doc_type:
q += f" AND dctypenorm:{doc_type}"
if oa_only:
q += " AND dcoa:1"
params = {
"func": "PerformSearch",
"query": q,
"format": "json",
"hits": hits,
"sortby": "dcyear desc",
}
resp = requests.get(BASE_URL, params=params)
resp.raise_for_status()
data = resp.json()
results = []
for doc in data.get("response", {}).get("docs", []):
results.append({
"title": doc.get("dctitle"),
"authors": doc.get("dccreator", []),
"year": doc.get("dcyear"),
"source": doc.get("dcsource"),
"url": doc.get("dcidentifier"),
"abstract": (doc.get("dcdescription") or "")[:300],
"open_access": doc.get("dcoa") == 1,
"type": doc.get("dctypenorm", []),
})
return results
def search_dissertations(topic: str, lang: str = "eng") -> list:
"""Find dissertations and theses on a topic."""
query = f"{topic} AND dctypenorm:15 AND dclang:{lang}"
return search_base(query, hits=50)
def search_by_provider(query: str, provider: str) -> list:
"""Search within a specific content provider."""
full_query = f"{query} AND dcprovider:{provider}"
return search_base(full_query)
# Example: find recent open access ML papers
papers = search_base("transformer self-attention", hits=10, oa_only=True)
for p in papers:
oa = "OA" if p["open_access"] else "restricted"
print(f"[{p['year']}] {p['title']} ({oa}) — {p['source']}")
# Example: find dissertations on climate modeling
theses = search_dissertations("climate modeling ocean")
for t in theses:
print(f"[{t['year']}] {t['title']} — {', '.join(t['authors'][:2])}")
BASE vs Other Search Engines
| Feature | BASE | Google Scholar | OpenAlex |
|---|---|---|---|
| Records | 400M+ | Unknown | 250M+ |
| Open access focus | Yes | No | Yes |
| Structured API | Yes | No official API | Yes |
| License metadata | Yes | No | Partial |
| Dissertation coverage | Excellent | Good | Limited |
| Repository-level filtering | Yes | No | No |
References
More from wentorai/research-plugins
academic-paper-summarizer
Summarize academic papers with structured extraction of key elements
40academic-translation-guide
Academic translation, post-editing, and Chinglish correction guide
30academic-writing-refiner
Checklist-driven academic English polishing and Chinglish correction
27academic-citation-manager
Manage academic citations across BibTeX, APA, MLA, and Chicago formats
26ai-writing-humanizer
Remove AI-generated patterns to produce natural, authentic academic writing
12abstract-writing-guide
Craft structured research abstracts that maximize clarity and journal acceptance
12