ai-rag-patterns
RAG Patterns — Retrieval-Augmented Generation
Acknowledgement: Shared by Peter Bamuhigire, techguypeter.com, +256 784 464178.
Use When
- Use when building features that answer questions from private data, documents, policies, or time-sensitive information — RAG architecture, chunking strategies, hybrid search, re-ranking, vector databases, evaluation, agentic RAG, multimodal RAG...
- The task needs reusable judgment, domain constraints, or a proven workflow rather than ad hoc advice.
Do Not Use When
- The task is unrelated to
ai-rag-patternsor would be better handled by a more specific companion skill. - The request only needs a trivial answer and none of this skill's constraints or references materially help.
Required Inputs
- Gather relevant project context, constraints, and the concrete problem to solve; load
referencesonly as needed. - Confirm the desired deliverable: design, code, review, migration plan, audit, or documentation.
Workflow
- Read this
SKILL.mdfirst, then load only the referenced deep-dive files that are necessary for the task. - Apply the ordered guidance, checklists, and decision rules in this skill instead of cherry-picking isolated snippets.
- Produce the deliverable with assumptions, risks, and follow-up work made explicit when they matter.
Quality Standards
- Keep outputs execution-oriented, concise, and aligned with the repository's baseline engineering standards.
- Preserve compatibility with existing project conventions unless the skill explicitly requires a stronger standard.
- Prefer deterministic, reviewable steps over vague advice or tool-specific magic.
Anti-Patterns
- Treating examples as copy-paste truth without checking fit, constraints, or failure modes.
- Loading every reference file by default instead of using progressive disclosure.
Outputs
- A concrete result that fits the task: implementation guidance, review findings, architecture decisions, templates, or generated artifacts.
- Clear assumptions, tradeoffs, or unresolved gaps when the task cannot be completed from available context alone.
- References used, companion skills, or follow-up actions when they materially improve execution.
Evidence Produced
| Category | Artifact | Format | Example |
|---|---|---|---|
| Correctness | RAG retrieval evaluation report | Markdown doc covering recall / precision / answer-quality on a fixed eval set | docs/ai/rag-eval-2026-04-16.md |
| Data safety | Index ingestion + tenancy isolation note | Markdown doc covering chunking, source filtering, and per-tenant index segregation | docs/ai/rag-tenancy-note.md |
References
- Use the
references/directory for deep detail after reading the core workflow below.
Overview
RAG solves the core LLM limitation: they only know what they were trained on. Use RAG to inject private data (invoices, menus, policies, reports) into every AI response.
Core principle: RAG = look up a database + LLM synthesises the results. The LLM never needs to "know" your data.
When to Use RAG
| Condition | Action |
|---|---|
| Knowledge base < 200K tokens (~500 pages) | Include everything in context — no RAG needed |
| Knowledge base > 200K tokens | Use RAG |
| Data changes frequently (menus, prices, stock) | RAG (update documents, not model) |
| Data is private/confidential | RAG (keeps data out of training pipelines) |
| Need source citations | RAG (chunks are traceable to source) |
| Model needs brand voice / domain jargon | Fine-tune instead |
RAG vs Fine-Tuning
| Factor | RAG | Fine-Tuning |
|---|---|---|
| Up-to-date content | ✅ Yes (add docs anytime) | ❌ Stale until retrained |
| Hallucinations | ✅ Lower (document-grounded) | ❌ Higher |
| Source citations | ✅ Yes | ❌ No |
| Brand voice control | ❌ Weak | ✅ Strong |
| Domain jargon | ❌ Weak | ✅ Strong |
| Up-front cost | ✅ Lower | ❌ High |
Default: start with RAG. Fine-tune only when RAG + prompt engineering cannot deliver the required tone or vocabulary.
Additional Guidance
Guidance is split across two reference files so this entrypoint stays compact.
references/skill-deep-dive.md — architecture, chunking, retrieval, schema:
Pipeline ArchitectureChunking StrategiesEmbedding Model SelectionVector Database SelectionRetrieval AlgorithmsRe-RankingFull RAG Query AlgorithmQuery Rewriting (Multi-Turn)RAG Schema (Multi-Tenant)Evaluation FrameworkProduction PatternsAgentic RAGMultimodal RAG,Edge Cases,Cost Optimisation,Sources
references/production-rag.md — the progression from draft to production and the gates before shipping:
RAG Maturity Model— Naive → Advanced → ModularQuery Transformation— HyDE, Multi-Query, Step-BackContextual CompressionSelf-RAGRAGAS Evaluation— 4 metrics with production thresholdsEmbedding Pipeline— batching, upserts, re-embed triggers, $/1M-token tableCost Management Decision Tree— concrete dollar figures per branchFailure Mode Playbook— empty, irrelevant, hallucinated, staleGates Before Shipping
Load the production file when building a RAG system that has to pass evaluation gates, survive multi-tenant review, or hit a cost budget under load.
More from peterbamuhigire/skills-web-dev
google-play-store-review
Google Play Store compliance and review readiness for Android apps. Use
76multi-tenant-saas-architecture
Use when designing or reviewing a multi-tenant SaaS platform — tenant
62jetpack-compose-ui
Jetpack Compose UI standards for beautiful, sleek, minimalistic Android
49gis-mapping
Use for web apps that need Leaflet-first GIS mapping, location selection,
49saas-accounting-system
Implement a complete double-entry accounting system inside any SaaS app.
47manual-guide
Generate end-user manuals and reference guides for ERP modules. Use when
38