RAG Architect

Design, tune, and evaluate production RAG pipelines with three deterministic tools. Run the tools against the actual corpus and requirements — do not pick chunk sizes or databases by intuition.

Hard rules

Never present model names or vendor prices as current facts. Embedding models and vector-DB pricing rot in months. Recommend a tier (see table below), name a current-generation candidate, and tell the user to verify against the provider's live pricing page.
Every design ends with an evaluation run. A RAG design without retrieval_evaluator.py numbers is a hypothesis, not a deliverable.
Chunking is corpus-driven. Run chunking_optimizer.py on the real documents before choosing a strategy.

Embedding model tiers (pattern, not price list)

Tier	Current-generation examples (verify before use)	When
Fast / self-hosted	`all-MiniLM-L6-v2`, `bge-small`	Cost-sensitive, small scale, real-time
Balanced open	`all-mpnet-base-v2`, `bge-large`, `e5-large`	Quality without API dependency
Quality API	`text-embedding-3-large`, `voyage-3-large`	Accuracy-priority general retrieval
Code	`voyage-code-3`, CodeBERT-family	Code search corpora

rag-architect

RAG Architect

Hard rules

Embedding model tiers (pattern, not price list)