RAG Architect

Design systems that connect LLMs to your private data (Retrieve -> Augment -> Generate).

When to Use This Skill

Garbage In, Chatbot Out.

Load: Extract text from PDF, HTML, Notion.
Chunk: Break into pieces.
- Fixed: 500 chars (Simple, breaks context).
- Semantic: Break by paragraphs/headers (Smart).
Embed: Convert text to vectors (OpenAI text-embedding-3-small, Cohere).

Vector Search (Semantic)

Keyword Search (Lexical)

Hybrid Search (The Gold Standard)

Construct the final prompt.

Answer the question based ONLY on the following context:
{retrieved_chunks}

Question: {user_query}

HyDE (Hypothetical Document Embeddings): Generate a fake answer, embed that, then search.
Parent Document Retriever: Search small chunks, but return the big parent chunk to the LLM.
Re-ranking: Use a Cross-Encoder (Cohere Rerank) to score the top 20 retrieved docs before sending to LLM.