exa-architecture-variants
SKILL.md
Exa Architecture Variants
Overview
Deployment architectures for Exa neural search at different scales. Exa's search-and-contents model supports everything from simple search features to full RAG pipelines and semantic knowledge bases.
Prerequisites
- Exa API configured
- Clear search use case defined
- Infrastructure for chosen architecture tier
Instructions
Step 1: Direct Search Integration (Simple)
Best for: Adding search to an existing app, < 1K queries/day.
User Query -> Backend -> Exa Search API -> Format Results -> User
from exa_py import Exa
exa = Exa(api_key=os.environ["EXA_API_KEY"])
@app.route('/search')
def search():
query = request.args.get('q')
results = exa.search_and_contents(
query, num_results=5, text={"max_characters": 1000} # 1000: 1 second in ms
)
return jsonify([{
"title": r.title, "url": r.url, "snippet": r.text[:200] # HTTP 200 OK
} for r in results.results])
Step 2: Cached Search with Semantic Layer (Moderate)
Best for: High-traffic search, 1K-50K queries/day, content aggregation.
User Query -> Cache Check -> (miss) -> Exa API -> Cache Store -> User
|
v (hit)
Cached Results -> User
class CachedExaSearch:
def __init__(self, exa_client, redis_client, ttl=600): # 600: timeout: 10 minutes
self.exa = exa_client
self.cache = redis_client
self.ttl = ttl
def search(self, query: str, **kwargs):
key = self._cache_key(query, **kwargs)
cached = self.cache.get(key)
if cached:
return json.loads(cached)
results = self.exa.search_and_contents(query, **kwargs)
serialized = self._serialize(results)
self.cache.setex(key, self.ttl, json.dumps(serialized))
return serialized
Step 3: RAG Pipeline with Exa as Knowledge Source (Scale)
Best for: AI-powered apps, 50K+ queries/day, LLM-augmented answers.
User Query -> Query Planner -> Exa Search -> Content Extraction
|
v
Vector Store (cache)
|
v
LLM Generation with Context -> User
class ExaRAGPipeline:
def __init__(self, exa, llm, vector_store):
self.exa = exa
self.llm = llm
self.vectors = vector_store
async def answer(self, question: str) -> dict:
# 1. Search for relevant content
results = self.exa.search_and_contents(
question, num_results=5, text={"max_characters": 3000}, # 3000: 3 seconds in ms
highlights=True
)
# 2. Store in vector cache for future queries
for r in results.results:
self.vectors.upsert(r.url, r.text, {"title": r.title})
# 3. Generate answer with citations
context = "\n\n".join([f"[{i+1}] {r.text}" for i, r in enumerate(results.results)])
answer = await self.llm.generate(
f"Based on the following sources, answer: {question}\n\n{context}"
)
return {"answer": answer, "sources": [r.url for r in results.results]}
Decision Matrix
| Factor | Direct | Cached | RAG Pipeline |
|---|---|---|---|
| Volume | < 1K/day | 1K-50K/day | 50K+/day |
| Latency | 1-3s | 50ms (cached) | 3-8s |
| Use Case | Simple search | Content aggregation | AI-powered answers |
| Complexity | Low | Medium | High |
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Slow search in UI | No caching | Add result cache with TTL |
| Stale cached results | Long TTL | Reduce TTL for time-sensitive queries |
| RAG hallucination | Poor source selection | Use highlights, increase num_results |
| High API costs | No query deduplication | Cache layer deduplicates identical queries |
Examples
Basic usage: Apply exa architecture variants to a standard project setup with default configuration options.
Advanced scenario: Customize exa architecture variants for production environments with multiple constraints and team-specific requirements.
Resources
Output
- Configuration files or code changes applied to the project
- Validation report confirming correct implementation
- Summary of changes made and their rationale
Weekly Installs
13
Repository
jeremylongshore…s-skillsGitHub Stars
1.6K
First Seen
Feb 18, 2026
Security Audits
Installed on
codex13
mcpjam12
claude-code12
junie12
windsurf12
zencoder12