rag-architect
SKILL.md
RAG Architect
Design systems that connect LLMs to your private data (Retrieve -> Augment -> Generate).
When to Use This Skill
- Building "Chat with your Data" applications
- Selecting a Vector Database (Pinecone, Milvus, pgvector)
- Designing Chunking Strategies (Fixed-size, Semantic, Markdown)
- Hybrid Search (Keyword + Semantic) implementation
- Reducing hallucinations with grounded context
Core Concepts
1. Ingestion Pipeline (The "Loader")
Garbage In, Chatbot Out.
- Load: Extract text from PDF, HTML, Notion.
- Chunk: Break into pieces.
- Fixed: 500 chars (Simple, breaks context).
- Semantic: Break by paragraphs/headers (Smart).
- Embed: Convert text to vectors (OpenAI
text-embedding-3-small, Cohere).
2. Retrieval Strategies
Vector Search (Semantic)
- Finds "meaning". ("Can I bring my dog?" matches "Pet Policy").
- Pros: Understands intent.
- Cons: Fails on exact keywords (IDs, acronyms).
Keyword Search (Lexical)
- BM25 / TF-IDF.
- Pros: Precise for exact terms.
- Cons: Dumb.
Hybrid Search (The Gold Standard)
- Combine Vector + Keyword.
- Use Reciprocal Rank Fusion (RRF) to re-rank results.
3. Generation (The "Synthesis")
Construct the final prompt.
Answer the question based ONLY on the following context:
{retrieved_chunks}
Question: {user_query}
Advanced RAG
- HyDE (Hypothetical Document Embeddings): Generate a fake answer, embed that, then search.
- Parent Document Retriever: Search small chunks, but return the big parent chunk to the LLM.
- Re-ranking: Use a Cross-Encoder (Cohere Rerank) to score the top 20 retrieved docs before sending to LLM.
Resources
Weekly Installs
1
Repository
mileycy516-stack/skillsFirst Seen
1 day ago
Security Audits
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1