knowledge-base-rag

Installation
SKILL.md

Knowledge Base RAG

Part of Agent Skills™ by googleadsagent.ai™

Description

Knowledge Base RAG implements the complete Retrieval-Augmented Generation pipeline: document ingestion, intelligent chunking, embedding generation, vector store indexing, semantic retrieval, and grounded response generation. The agent builds RAG systems that answer questions from private knowledge bases with cited sources and reduced hallucination.

RAG solves the fundamental limitation of large language models: they cannot access information created after their training cutoff or proprietary information they were never trained on. By retrieving relevant documents from a vector store and injecting them into the prompt context, RAG grounds the model's responses in factual, up-to-date, organization-specific knowledge.

The quality of a RAG system depends on chunking strategy more than model choice. This skill encodes production-tested chunking approaches: semantic chunking that preserves paragraph coherence, recursive splitting that respects document structure (headings, code blocks, tables), and overlap windows that maintain context across chunk boundaries. Each strategy is matched to the document type for optimal retrieval quality.

Use When

  • Building question-answering systems over private documents
  • Creating a searchable knowledge base from documentation, wikis, or PDFs
  • Reducing hallucination by grounding LLM responses in retrieved facts
  • Implementing semantic search across large document collections
  • Building customer support bots with product-specific knowledge
  • The user asks about RAG, vector search, or document embedding

How It Works

graph TD
    A[Documents: PDF, MD, HTML] --> B[Ingestion Pipeline]
    B --> C[Extract Text + Metadata]
    C --> D[Intelligent Chunking]
    D --> E[Generate Embeddings]
    E --> F[Index in Vector Store]
    G[User Query] --> H[Embed Query]
    H --> I[Semantic Search: Top-K]
    I --> J[Re-rank Results]
    J --> K[Construct Prompt with Context]
    K --> L[LLM Generation]
    L --> M[Response with Citations]

The pipeline has two phases: offline ingestion (documents to vectors) and online retrieval (query to answer). The re-ranking step applies a cross-encoder to refine the initial vector search results, improving precision before the generation step.

Implementation

from dataclasses import dataclass
import hashlib

@dataclass
class Chunk:
    text: str
    metadata: dict
    embedding: list[float] | None = None

    @property
    def id(self) -> str:
        return hashlib.sha256(self.text.encode()).hexdigest()[:16]

class RecursiveChunker:
    def __init__(self, max_tokens: int = 512, overlap: int = 64):
        self.max_tokens = max_tokens
        self.overlap = overlap
        self.separators = ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]

    def chunk(self, text: str, metadata: dict) -> list[Chunk]:
        chunks = self._split(text, self.separators)
        return [
            Chunk(text=c.strip(), metadata={**metadata, "chunk_index": i})
            for i, c in enumerate(chunks) if c.strip()
        ]

    def _split(self, text: str, separators: list[str]) -> list[str]:
        if not separators or self._token_count(text) <= self.max_tokens:
            return [text]

        sep = separators[0]
        parts = text.split(sep)
        chunks, current = [], ""

        for part in parts:
            candidate = current + sep + part if current else part
            if self._token_count(candidate) > self.max_tokens and current:
                chunks.append(current)
                overlap_text = current[-self.overlap * 4:]
                current = overlap_text + sep + part
            else:
                current = candidate

        if current:
            chunks.append(current)

        result = []
        for chunk in chunks:
            if self._token_count(chunk) > self.max_tokens:
                result.extend(self._split(chunk, separators[1:]))
            else:
                result.append(chunk)
        return result

    def _token_count(self, text: str) -> int:
        return len(text) // 4

class RAGPipeline:
    def __init__(self, embedder, vector_store, llm):
        self.embedder = embedder
        self.store = vector_store
        self.llm = llm
        self.chunker = RecursiveChunker()

    async def ingest(self, documents: list[dict]) -> int:
        all_chunks = []
        for doc in documents:
            chunks = self.chunker.chunk(doc["text"], doc["metadata"])
            for chunk in chunks:
                chunk.embedding = await self.embedder.embed(chunk.text)
            all_chunks.extend(chunks)

        await self.store.upsert(all_chunks)
        return len(all_chunks)

    async def query(self, question: str, top_k: int = 5) -> dict:
        query_embedding = await self.embedder.embed(question)
        results = await self.store.search(query_embedding, top_k=top_k)

        context = "\n\n".join(
            f"[Source: {r.metadata.get('source', 'unknown')}]\n{r.text}" for r in results
        )

        prompt = f"""Answer the question based on the provided context. Cite sources.
If the context does not contain the answer, say so explicitly.

Context:
{context}

Question: {question}"""

        response = await self.llm.generate(prompt)
        return {"answer": response, "sources": [r.metadata for r in results]}

Best Practices

  • Use recursive chunking that respects document structure (headings, paragraphs, code blocks)
  • Set chunk size to 256-512 tokens with 10-15% overlap for most use cases
  • Re-rank vector search results with a cross-encoder before passing to the LLM
  • Include source metadata in every chunk for citation generation
  • Deduplicate chunks by content hash before indexing to avoid retrieval noise
  • Instruct the LLM to say "I don't know" when the context lacks the answer

Platform Compatibility

Platform Support Notes
Cursor Full Pipeline code generation
VS Code Full Python/TS RAG implementation
Windsurf Full RAG workflow support
Claude Code Full End-to-end RAG building
Cline Full Vector store integration
aider Partial Code-level support

Related Skills

Keywords

rag retrieval-augmented-generation vector-search embeddings chunking knowledge-base semantic-search document-qa


© 2026 googleadsagent.ai™ | Agent Skills™ | MIT License

Related skills
Installs
7
GitHub Stars
3
First Seen
Apr 12, 2026