exa-known-pitfalls
SKILL.md
Exa Known Pitfalls
Overview
Real gotchas when integrating Exa's neural search API. Exa uses embeddings-based search rather than keyword matching, which creates a different class of failure modes than traditional search APIs.
Prerequisites
- Exa API key configured
- Understanding of neural vs keyword search differences
- Familiarity with search result relevance scoring
Instructions
Step 1: Avoid Keyword-Style Queries
Exa's neural search interprets natural language, not keywords. Boolean operators and exact-match syntax degrade results.
from exa_py import Exa
exa = Exa(api_key=os.environ["EXA_API_KEY"])
# BAD: keyword-style query returns poor results
results = exa.search("python AND machine learning OR deep learning 2024") # 2024 year
# GOOD: natural language query
results = exa.search(
"recent tutorials on building ML models with Python",
num_results=10,
use_autoprompt=True # let Exa optimize the query
)
Step 2: Don't Ignore Search Type Selection
Exa offers neural and keyword search. Using the wrong type silently degrades quality.
# BAD: using neural search for a specific URL or exact title
results = exa.search("arxiv.org/abs/2301.00001", type="neural")
# GOOD: use keyword search for exact matches, neural for concepts
results = exa.search("arxiv.org/abs/2301.00001", type="keyword")
results_concept = exa.search(
"transformer architecture improvements for long context",
type="neural"
)
Step 3: Handle Content Retrieval Separately
A common mistake is assuming search() returns full page content. It returns metadata only unless you request contents.
# BAD: accessing content that wasn't fetched
results = exa.search("AI safety research papers")
text = results.results[0].text # None! Content not requested
# GOOD: use search_and_contents or get_contents
results = exa.search_and_contents(
"AI safety research papers",
text={"max_characters": 3000}, # 3000: 3 seconds in ms
highlights=True
)
print(results.results[0].text) # full text
print(results.results[0].highlights) # key excerpts
Step 4: Watch Date Filtering Edge Cases
Date filters silently exclude results. Overly narrow windows return empty results without error.
# BAD: too narrow, may return nothing
results = exa.search(
"breaking news in AI",
start_published_date="2024-03-10", # 2024 year
end_published_date="2024-03-10" # single day = few results
)
# GOOD: reasonable date window with fallback
results = exa.search(
"breaking news in AI",
start_published_date="2024-03-01",
end_published_date="2024-03-15"
)
if not results.results:
results = exa.search("breaking news in AI", num_results=5)
Step 5: Autoprompt Cost Awareness
use_autoprompt=True makes an extra LLM call per request, adding latency and cost.
# BAD: autoprompt on every request in a high-volume loop
for query in thousands_of_queries:
exa.search(query, use_autoprompt=True) # 2x cost, extra latency
# GOOD: use autoprompt selectively
results = exa.search(
well_formed_query,
use_autoprompt=False # skip when query is already well-structured
)
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Empty results, no error | Date filter too narrow | Widen date range or remove filter |
| Low relevance scores | Keyword-style query | Rewrite as natural language |
Missing .text field |
Content not requested | Use search_and_contents() |
| Slow responses | Autoprompt on every call | Disable for pre-optimized queries |
| 429 rate limit | Burst requests | Add exponential backoff with jitter |
Examples
Similarity Search Pitfall
# find_similar requires a URL, not a query string
# BAD:
results = exa.find_similar("machine learning papers")
# GOOD:
results = exa.find_similar(
"https://arxiv.org/abs/2301.00001",
num_results=10
)
Resources
Output
- Configuration files or code changes applied to the project
- Validation report confirming correct implementation
- Summary of changes made and their rationale
Weekly Installs
15
Repository
jeremylongshore…s-skillsGitHub Stars
1.6K
First Seen
Feb 18, 2026
Security Audits
Installed on
codex15
mcpjam14
claude-code14
junie14
windsurf14
zencoder14