arxiv-search
arXiv Search
Search the arXiv public API for research papers. Returns structured metadata (title, authors, abstract, arXiv ID, categories, dates, PDF/HTML links) as JSON. For full-text analysis of a specific paper, pair with arxiv-analyze.
When to use
- User wants to discover papers on a topic
- User wants recent submissions in an arXiv category
- User wants to check what an author has published
- Starting point before analyzing a specific paper
Usage
The script is at scripts/arxiv_search.py. It hits the arXiv API directly and parses the Atom XML into JSON so the model never has to touch XML.
# Topic search
python3 scripts/arxiv_search.py "mechanistic interpretability" --max 20
# Category filter (see arXiv taxonomy: cs.LG, cs.CL, stat.ML, etc.)
python3 scripts/arxiv_search.py --category cs.LG --max 30 --sort-by submittedDate
# Topic + category + date range
python3 scripts/arxiv_search.py "sparse autoencoders" --category cs.LG \
--from 2025-01-01 --to 2026-04-16 --max 50
# Recency-focused (newest first)
python3 scripts/arxiv_search.py "LLM agents" --sort-by submittedDate --max 20
Flags:
--max N— max results (default 20; arXiv API caps at 2000)--category CAT— arXiv category code--from YYYY-MM-DD/--to YYYY-MM-DD— submission date filter--sort-by relevance|lastUpdatedDate|submittedDate— default relevance
Output: JSON on stdout. Each result has id, title, authors[], abstract, categories[], primary_category, published, updated, abs_url, pdf_url, doi, journal_ref, comment.
Workflow
1. Parse intent
- Topic search: "find papers on X" → use
"X"as query - Recent in field: "what's new in cs.LG" →
--category cs.LG --sort-by submittedDate - Author search: "papers by " → query the name; arXiv indexes author names in
all: - Combined: topic + time window + category
2. Run the search
Invoke the script. Default --max 20 is a good starting point. Bump to 50 for broad surveys.
3. Present results
Format as a compact table. For each paper:
- arXiv ID (with
abs_urlfor the link) - Title
- Authors (first 2 + "et al." if more)
- Published date
- Primary category
- 1-sentence abstract summary (don't dump the full abstract)
4. Offer handoffs
Ask if the user wants to:
- Analyze a specific paper (→
arxiv-analyze) - Create a watch for this query (→
arxiv-monitor add)
Output format
### arXiv Search: <query>
| # | arXiv ID | Title | Authors | Date |
|---|----------|-------|---------|------|
| 1 | 2501.11120v1 | Tell me about yourself... | Betley et al. | 2025-01-19 |
**Next:**
- analyze <id> to fetch full text
- watch <name> to track this query ongoingly
Token efficiency
- 20 results = ~5K tokens (abstracts are the bulk)
- Ask the user to narrow the query rather than dumping 50 results
- For briefing only, pipe through
jq '.results | map({id, title, authors, published})'before sending to context
arXiv taxonomy quick reference
Common categories (see https://arxiv.org/category_taxonomy for full list):
cs.CL— Computation and Language (NLP)cs.LG— Machine Learningcs.AI— Artificial Intelligencecs.CR— Cryptography and Securitycs.CY— Computers and Societystat.ML— Statistics: Machine Learningcs.IR— Information Retrievalcs.HC— Human-Computer Interaction
Rate limits
arXiv API: soft limit of 1 request per 3 seconds per IP. This skill issues one request per invocation — well within limits.
Error handling
- Invalid category or bad query → empty
resultsarray. Report "no papers found" and suggest broadening the query. - Network error → exit 4, message on stderr.
- Malformed response → exit 5 (extremely rare; would signal arXiv API changes).
Requirements
- Python 3.11+ (stdlib only, no pip install needed)
More from dsebastien/ai-skill-arxiv
arxiv-analyze
Fetch and analyze an arXiv paper via tiered fallback (markdown -> arXiv HTML -> ar5iv -> PDF). Rate-limited for arxiv2md (28 req/min, deterministic). Produces a structured summary with citation, problem, key claims, method, results, limitations. Use when the user says "analyze arxiv paper", "summarize arxiv", "read paper", "what does this paper say", or provides an arXiv ID/URL to analyze.
5arxiv-monitor
Monitor arXiv for new papers matching saved queries/categories. Manages a watchlist with deterministic "seen" tracking so subsequent checks return only new material. Use when the user says "watch arxiv", "track arxiv", "new papers on", "arxiv monitor", "check my arxiv watches", or wants periodic discovery. Depends on arxiv-search (must be installed as a sibling skill).
4