openalex-paper-search
Academic Paper Search (OpenAlex)
Search 240M+ scholarly works using the OpenAlex API -- completely free, no API key required, no SDK needed. Just curl or bash with URL construction.
Full docs: https://docs.openalex.org
Quick Start
OpenAlex is a REST API. You query it by constructing URLs and fetching them with curl. All responses are JSON.
# Search for papers about "transformer architecture"
curl -s "https://api.openalex.org/works?search=transformer+architecture&per_page=5&mailto=agent@kortix.ai" | python3 -m json.tool
Important: Always include mailto=agent@kortix.ai (or any valid email) in every request. Without it, you're limited to 1 request/second. With it, you get 10 requests/second (the "polite pool").
Core Concepts
Entities
OpenAlex has these entity types (all queryable):
| Entity | Endpoint | Count | Description |
|---|---|---|---|
| Works | /works |
240M+ | Papers, articles, books, datasets, theses |
| Authors | /authors |
90M+ | People who create works |
| Sources | /sources |
250K+ | Journals, repositories, conferences |
| Institutions | /institutions |
110K+ | Universities, research orgs |
| Topics | /topics |
4K+ | Research topics (hierarchical) |
Work Object -- Key Fields
When you fetch a work, these are the most useful fields:
id OpenAlex ID (e.g., "https://openalex.org/W2741809807")
doi DOI URL
title / display_name Paper title
publication_year Year published
publication_date Full date (YYYY-MM-DD)
cited_by_count Number of incoming citations
fwci Field-Weighted Citation Impact (normalized)
type article, preprint, review, book, dataset, etc.
language ISO 639-1 code (e.g., "en")
is_retracted Boolean
open_access.is_oa Boolean -- is it freely accessible?
open_access.oa_url Direct URL to free version
authorships List of authors with names, institutions, ORCIDs
abstract_inverted_index Abstract as inverted index (needs reconstruction)
referenced_works List of OpenAlex IDs this work cites (outgoing)
related_works Algorithmically related works
cited_by_api_url API URL to get works that cite this one (incoming)
topics Assigned research topics with scores
keywords Extracted keywords with scores
primary_location Where the work is published (journal, repo)
best_oa_location Best open access location with PDF link
Reconstructing Abstracts
OpenAlex stores abstracts as inverted indexes for legal reasons. To get plaintext, reconstruct:
import json, sys
# Read the abstract_inverted_index from a work object
inv_idx = work["abstract_inverted_index"]
if inv_idx:
words = [""] * (max(max(positions) for positions in inv_idx.values()) + 1)
for word, positions in inv_idx.items():
for pos in positions:
words[pos] = word
abstract = " ".join(words)
Or in bash with python3 -c:
# Pipe a work JSON into this to extract the abstract
echo "$WORK_JSON" | python3 -c "
import json,sys
w=json.load(sys.stdin)
idx=w.get('abstract_inverted_index',{})
if idx:
words=['']*( max(max(p) for p in idx.values())+1 )
for word,positions in idx.items():
for pos in positions: words[pos]=word
print(' '.join(words))
"
Searching for Papers
Basic Keyword Search
Searches across titles, abstracts, and fulltext. Uses stemming and stop-word removal.
# Simple search
curl -s "https://api.openalex.org/works?search=large+language+models&mailto=agent@kortix.ai"
# With per_page limit
curl -s "https://api.openalex.org/works?search=CRISPR+gene+editing&per_page=10&mailto=agent@kortix.ai"
Boolean Search
Use uppercase AND, OR, NOT with parentheses and quoted phrases:
# Complex boolean query
curl -s "https://api.openalex.org/works?search=(reinforcement+learning+AND+%22robot+control%22)+NOT+simulation&mailto=agent@kortix.ai"
# Exact phrase match (use double quotes, URL-encoded as %22)
curl -s "https://api.openalex.org/works?search=%22attention+is+all+you+need%22&mailto=agent@kortix.ai"
Search Specific Fields
# Title only
curl -s "https://api.openalex.org/works?filter=title.search:transformer&mailto=agent@kortix.ai"
# Abstract only
curl -s "https://api.openalex.org/works?filter=abstract.search:protein+folding&mailto=agent@kortix.ai"
# Title and abstract combined
curl -s "https://api.openalex.org/works?filter=title_and_abstract.search:neural+scaling+laws&mailto=agent@kortix.ai"
# Fulltext search (subset of works)
curl -s "https://api.openalex.org/works?filter=fulltext.search:climate+tipping+points&mailto=agent@kortix.ai"
Filtering
Filters are the most powerful feature. Combine them with commas (AND) or pipes (OR).
Most Useful Filters
# By publication year
?filter=publication_year:2024
?filter=publication_year:2020-2024
?filter=publication_year:>2022
# By citation count
?filter=cited_by_count:>100 # highly cited
?filter=cited_by_count:>1000 # landmark papers
# By open access
?filter=is_oa:true # only open access
?filter=oa_status:gold # gold OA only
# By type
?filter=type:article # journal articles
?filter=type:preprint # preprints
?filter=type:review # review articles
# By language
?filter=language:en # English only
# Not retracted
?filter=is_retracted:false
# Has abstract
?filter=has_abstract:true
# Has downloadable PDF
?filter=has_content.pdf:true
# By author (OpenAlex ID)
?filter=author.id:A5023888391
# By institution (OpenAlex ID)
?filter=institutions.id:I27837315 # e.g., University of Michigan
# By DOI
?filter=doi:https://doi.org/10.1038/s41586-021-03819-2
# By indexed source
?filter=indexed_in:arxiv # arXiv papers
?filter=indexed_in:pubmed # PubMed papers
?filter=indexed_in:crossref # Crossref papers
Combining Filters
# AND: comma-separated
?filter=publication_year:>2022,cited_by_count:>50,is_oa:true,type:article
# OR: pipe-separated within a filter
?filter=publication_year:2023|2024
# NOT: prefix with !
?filter=type:!preprint
# Combined example: highly-cited OA articles from 2023-2024, not preprints
curl -s "https://api.openalex.org/works?filter=publication_year:2023-2024,cited_by_count:>50,is_oa:true,type:!preprint&search=machine+learning&per_page=10&mailto=agent@kortix.ai"
Sorting
# Most cited first
?sort=cited_by_count:desc
# Most recent first
?sort=publication_date:desc
# Most relevant first (only when using search)
?sort=relevance_score:desc
# Multiple sort keys
?sort=publication_year:desc,cited_by_count:desc
Pagination
Two modes: basic paging (for browsing) and cursor paging (for collecting all results).
# Basic paging (limited to 10,000 results)
?page=1&per_page=25
?page=2&per_page=25
# Cursor paging (unlimited, for collecting everything)
?per_page=100&cursor=* # first page
?per_page=100&cursor=IlsxNjk0ODc... # next page (cursor from previous response meta)
The cursor for the next page is in response.meta.next_cursor. When it's null, you've reached the end.
Select Fields
Reduce response size by selecting only the fields you need:
# Only get IDs, titles, citation counts, and DOIs
?select=id,display_name,cited_by_count,doi,publication_year
# Minimal metadata for scanning
?select=id,display_name,publication_year,cited_by_count,open_access
Citation Graph Traversal
Find what a paper cites (outgoing references)
# Get works cited BY a specific paper
curl -s "https://api.openalex.org/works?filter=cited_by:W2741809807&per_page=25&mailto=agent@kortix.ai"
Find what cites a paper (incoming citations)
# Get works that CITE a specific paper
curl -s "https://api.openalex.org/works?filter=cites:W2741809807&sort=cited_by_count:desc&per_page=25&mailto=agent@kortix.ai"
Find related works
# Get related works (algorithmic, based on shared concepts)
curl -s "https://api.openalex.org/works?filter=related_to:W2741809807&per_page=25&mailto=agent@kortix.ai"
Citation chain: follow the references
- Get a seminal paper by DOI
- Find its
referenced_works(what it cites) - Find who cites it (
filter=cites:WORK_ID) - For the most cited citers, repeat
This is how you build a literature graph around a topic.
Author Lookup
# Search for an author
curl -s "https://api.openalex.org/authors?search=Yann+LeCun&mailto=agent@kortix.ai"
# Get an author's works (by OpenAlex author ID)
curl -s "https://api.openalex.org/works?filter=author.id:A5064850633&sort=cited_by_count:desc&per_page=10&mailto=agent@kortix.ai"
# Get an author by ORCID
curl -s "https://api.openalex.org/authors/orcid:0000-0001-6187-6610?mailto=agent@kortix.ai"
Lookup by External ID
# By DOI
curl -s "https://api.openalex.org/works/doi:10.1038/s41586-021-03819-2?mailto=agent@kortix.ai"
# By PubMed ID
curl -s "https://api.openalex.org/works/pmid:14907713?mailto=agent@kortix.ai"
# By arXiv ID (via DOI)
curl -s "https://api.openalex.org/works/doi:10.48550/arXiv.2303.08774?mailto=agent@kortix.ai"
# Batch lookup: up to 50 IDs at once
curl -s "https://api.openalex.org/works?filter=doi:https://doi.org/10.1234/a|https://doi.org/10.1234/b|https://doi.org/10.1234/c&mailto=agent@kortix.ai"
Open Access & PDF Access
# Find OA papers with direct PDF links
curl -s "https://api.openalex.org/works?search=quantum+computing&filter=is_oa:true,has_content.pdf:true&select=id,display_name,open_access,best_oa_location&per_page=5&mailto=agent@kortix.ai"
The best_oa_location.pdf_url field gives a direct PDF link when available. The open_access.oa_url gives the best available OA landing page or PDF.
Practical Workflows
Literature Survey on a Topic
# 1. Find the most-cited papers on a topic
curl -s "https://api.openalex.org/works?search=retrieval+augmented+generation&sort=cited_by_count:desc&filter=publication_year:>2020,type:article,has_abstract:true&per_page=20&select=id,display_name,publication_year,cited_by_count,doi,authorships,abstract_inverted_index&mailto=agent@kortix.ai"
# 2. For the top papers, explore their citation graphs
curl -s "https://api.openalex.org/works?filter=cites:W4285719527&sort=cited_by_count:desc&per_page=10&select=id,display_name,publication_year,cited_by_count,doi&mailto=agent@kortix.ai"
# 3. Find recent papers building on this work
curl -s "https://api.openalex.org/works?filter=cites:W4285719527,publication_year:>2023&sort=publication_date:desc&per_page=10&mailto=agent@kortix.ai"
Find Landmark/Seminal Papers
# Highly cited + search term
curl -s "https://api.openalex.org/works?search=attention+mechanism+neural+networks&filter=cited_by_count:>500,type:article&sort=cited_by_count:desc&per_page=10&select=id,display_name,publication_year,cited_by_count,doi&mailto=agent@kortix.ai"
Find Recent Preprints
# Latest preprints on a topic
curl -s "https://api.openalex.org/works?search=multimodal+large+language+models&filter=type:preprint,publication_year:2025&sort=publication_date:desc&per_page=15&mailto=agent@kortix.ai"
Find Review Articles
# Review/survey papers on a topic
curl -s "https://api.openalex.org/works?search=federated+learning&filter=type:review,cited_by_count:>20&sort=cited_by_count:desc&per_page=10&mailto=agent@kortix.ai"
Author Analysis
# 1. Find the author
curl -s "https://api.openalex.org/authors?search=Geoffrey+Hinton&select=id,display_name,works_count,cited_by_count,last_known_institutions&mailto=agent@kortix.ai"
# 2. Get their most influential papers
curl -s "https://api.openalex.org/works?filter=author.id:A5068082743&sort=cited_by_count:desc&per_page=10&select=id,display_name,publication_year,cited_by_count,doi&mailto=agent@kortix.ai"
# 3. Get their recent work
curl -s "https://api.openalex.org/works?filter=author.id:A5068082743,publication_year:>2023&sort=publication_date:desc&per_page=10&mailto=agent@kortix.ai"
Saving Results to Disk
When doing deep research, save paper data to disk for later processing:
# Save search results as JSON
curl -s "https://api.openalex.org/works?search=topic&per_page=50&mailto=agent@kortix.ai" > research/papers/topic-search.json
# Extract and save a clean summary
curl -s "https://api.openalex.org/works?search=topic&per_page=50&select=id,display_name,publication_year,cited_by_count,doi,authorships&mailto=agent@kortix.ai" | python3 -c "
import json, sys
data = json.load(sys.stdin)
for w in data.get('results', []):
authors = ', '.join(a['author']['display_name'] for a in w.get('authorships', [])[:3])
if len(w.get('authorships', [])) > 3: authors += ' et al.'
print(f\"[{w.get('cited_by_count',0)} cites] {w['display_name']} ({w.get('publication_year','?')}) - {authors}\")
if w.get('doi'): print(f\" DOI: {w['doi']}\")
print()
" > research/papers/topic-summary.txt
For deep research, save individual paper metadata to your sources-index.md and raw data to sources/:
# Save a paper's full metadata
curl -s "https://api.openalex.org/works/W2741809807?mailto=agent@kortix.ai" > research/sources/001-paper-title.json
Rate Limits
| Pool | Rate | How to get it |
|---|---|---|
| Common | 1 req/sec | No email provided |
| Polite | 10 req/sec | Add mailto=your@email.com to requests |
| Premium | Higher | Paid API key via api_key param |
Always use the polite pool. Add &mailto=agent@kortix.ai to every request.
Tips
- Use
selectaggressively to reduce response size and speed up requests - Use
per_page=100(max) when collecting lots of results to minimize request count - Use cursor paging (
cursor=*) when you need more than 10,000 results - Batch DOI lookups with OR syntax:
filter=doi:DOI1|DOI2|DOI3(up to 50) - Reconstruct abstracts using the inverted index -- don't skip this, abstracts are gold
- Follow citation chains to find seminal works and recent developments
- Filter by
has_abstract:truewhen you need abstracts (not all works have them) - Filter by
indexed_in:arxivorindexed_in:pubmedto target specific repositories - Sort by
cited_by_count:descto find the most influential papers first - Combine search + filters for precise results: search gives relevance, filters give precision
More from kortix-ai/kortix-registry
legal-writer
Legal document drafting -- contracts, memos, briefs, complaints, demand letters, opinions, discovery, settlements, ToS, privacy policies. Full pipeline: document structure, per-section writing, Bluebook citation, case law lookup (CourtListener API), regulation lookup (eCFR API), DOCX output, and TDD-style verification (defined terms, cross-references, placeholders, boilerplate, citation format). Triggers on: 'draft a contract', 'write a legal memo', 'create an NDA', 'write a brief', 'legal document about', 'draft a complaint', 'terms of service', 'privacy policy', 'demand letter', 'settlement agreement', 'legal opinion', 'discovery requests', any request to produce a legal or law-related document.
57paper-creator
Scientific paper writing in LaTeX -- full pipeline from structure to compiled PDF. TDD-driven: every section is compiled and verified before moving to the next. Covers project scaffolding, citation management (OpenAlex to BibTeX), per-section academic writing with self-reflection, figure/table inclusion, LaTeX compilation, and comprehensive verification. Triggers on: 'write a paper', 'create a paper', 'academic paper about', 'scientific paper', 'LaTeX paper', 'write up results as a paper', 'draft a paper on', 'research paper about', any request to produce a formal academic/scientific paper in LaTeX. Assumes research findings, data, and/or figures already exist or will be provided -- this skill handles the WRITING, not the experimentation.
11deep-research
Deep research agent skill. Use when the user needs thorough, scientific, truth-seeking research on any topic -- investigating claims, finding primary sources, synthesizing evidence, producing cited reports. Triggers on: 'research this', 'investigate', 'deep dive', 'find sources', 'what does the evidence say', 'literature review', 'fact check', 'analyze the research on', any request requiring multi-source investigation with citations.
10memory-context-management
Memory, context, and persistent knowledge management for the Kortix agent. Covers: kortix-sys-oc-plugin (observations, LTM consolidation, mem_search, mem_save, session_list, session_get), filesystem persistence rules, using .MD files for plans/notes/project state, how filesystem writes feed the memory pipeline, and best practices for ensuring nothing important is ever lost. Load this skill when you need to: understand how your memory works, decide where to persist information, write plans or notes, manage project context across sessions, or optimize your context window usage.
8domain-research
Free domain research and availability checking. No API keys or credentials required. Uses RDAP (1195+ TLDs) with whois CLI fallback for universal coverage. Checks if domains are available, searches keywords across TLDs, performs WHOIS/RDAP lookups, checks expiry dates, and finds nameservers. Use when the agent needs to: check if a domain is available, search for domains, find who owns a domain, check domain expiration, get nameservers, bulk check domains, or do any domain research. Triggers on: 'check domain', 'is domain available', 'search domains', 'domain availability', 'who owns this domain', 'whois', 'domain expiry', 'when does domain expire', 'nameservers for', 'domain research', 'find domains for', 'domain ideas', 'bulk domain check'.
7elevenlabs
ElevenLabs audio generation — text-to-speech, voice cloning, and sound effects. Use this skill any time the agent needs to: convert text to spoken audio, narrate documents or content, generate voiceovers, clone voices from audio samples, create sound effects, or produce any audio output from text. Supports multiple voices, languages, models, voice cloning, batch processing, and sound effect generation. Requires ELEVENLABS_API_KEY.
6