dspy-retrieval
Retrieval Modules in DSPy
Guide the user through DSPy's retrieval modules for searching documents, computing embeddings, and building RAG (retrieval-augmented generation) pipelines.
What retrieval modules are
DSPy provides retrieval modules that fetch relevant documents or passages given a query. These modules plug into DSPy programs just like dspy.Predict or dspy.ChainOfThought -- declare them in __init__, call them in forward(), and optimizers handle the rest.
There are four key components:
| Component | Purpose | When to use |
|---|---|---|
dspy.Retrieve |
Base retriever class | Wrap any search backend (Elastic, Pinecone, etc.) |
dspy.ColBERTv2 |
ColBERTv2 retrieval client | Query a hosted ColBERTv2 server |
dspy.Embedder |
Compute embeddings | Turn text into vectors using any LiteLLM-supported model |
dspy.retrievers.Embeddings |
Local vector search | Build a retriever from an embedder + corpus, uses FAISS |
dspy.Retrieve
The base class for all retrievers. Use it directly with a configured retrieval model (rm), or subclass it to wrap your own search backend.
Using with a configured RM
import dspy
# Configure a retrieval model globally
colbert = dspy.ColBERTv2(url="http://your-server:8893/api/search")
dspy.configure(lm=lm, rm=colbert)
# Use dspy.Retrieve -- it delegates to the configured rm
retriever = dspy.Retrieve(k=5)
result = retriever("What is retrieval-augmented generation?")
print(result.passages) # list[str] of top-k passages
Key parameters
k(int) -- number of passages to retrieve. Can be set at init time or overridden per call.
Return value
dspy.Retrieve returns a dspy.Prediction with a .passages attribute -- a list[str] of the top-k retrieved passages.
Subclassing for custom backends
Wrap any search system by subclassing dspy.Retrieve and implementing forward():
class MyRetriever(dspy.Retrieve):
def __init__(self, search_client, k=3):
super().__init__(k=k)
self.client = search_client
def forward(self, query, k=None):
k = k or self.k
results = self.client.search(query, top_k=k)
return dspy.Prediction(passages=[r["text"] for r in results])
The forward() method must:
- Accept
query(str) and optionalk(int) - Return a
dspy.Predictionwith apassagesfield (list of strings)
dspy.ColBERTv2
A retrieval client that queries a hosted ColBERTv2 server. ColBERTv2 is a neural retrieval model that provides high-quality passage retrieval.
Constructor
colbert = dspy.ColBERTv2(url="http://your-server:8893/api/search")
Parameters:
url(str) -- URL of the ColBERTv2 server endpoint
Usage
# Direct call
results = colbert("What is DSPy?", k=3)
# Returns list of dicts with 'text', 'score', etc.
# As a configured retrieval model
dspy.configure(lm=lm, rm=colbert)
retriever = dspy.Retrieve(k=5)
passages = retriever("search query").passages
Setting up a ColBERTv2 server
Stanford hosts a public ColBERTv2 server for Wikipedia that you can use for testing:
colbert = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")
dspy.configure(lm=lm, rm=colbert)
For your own data, you need to run a ColBERTv2 server. See the ColBERT repository for setup instructions.
dspy.Embedder
Computes embeddings for text using any LiteLLM-supported embedding model. This is not a retriever itself -- it turns text into vectors that you can use with dspy.retrievers.Embeddings or your own vector store.
Constructor
embedder = dspy.Embedder(
"openai/text-embedding-3-small", # model identifier (LiteLLM format)
dimensions=512, # optional: output dimensions
)
Parameters:
- model (str) -- embedding model in LiteLLM format (e.g.,
"openai/text-embedding-3-small","cohere/embed-english-v3.0") - dimensions (int, optional) -- output embedding dimensions (if the model supports it)
- batch_size (int, optional) -- batch size for embedding multiple texts
Usage
# Embed a single text
vector = embedder("What is DSPy?")
# Returns a list of floats
# Embed multiple texts
vectors = embedder(["text one", "text two", "text three"])
# Returns a list of lists of floats
Supported providers
Any embedding model supported by LiteLLM works:
# OpenAI
embedder = dspy.Embedder("openai/text-embedding-3-small")
# Cohere
embedder = dspy.Embedder("cohere/embed-english-v3.0")
# Local via Ollama
embedder = dspy.Embedder("ollama/nomic-embed-text")
dspy.retrievers.Embeddings
A local vector search retriever that uses FAISS under the hood. Give it an Embedder and a corpus, and it builds an in-memory index for fast similarity search.
Constructor
import dspy
embedder = dspy.Embedder("openai/text-embedding-3-small", dimensions=512)
search = dspy.retrievers.Embeddings(
embedder=embedder,
corpus=corpus, # list[str] of documents
k=5, # number of results to return
)
Parameters:
embedder-- adspy.Embedderinstancecorpus(list[str]) -- the documents to index and search overk(int) -- default number of results to return
Usage
# Search
result = search("How do I reset my password?")
print(result.passages) # list[str] of top-k matching documents
# Use in a module
class QA(dspy.Module):
def __init__(self, search):
self.search = search
self.answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.search(question).passages
return self.answer(context=context, question=question)
When to use Embeddings vs. ColBERTv2
| Scenario | Use |
|---|---|
| Quick prototyping with small-medium corpus | dspy.retrievers.Embeddings |
| Need a hosted, scalable retrieval server | dspy.ColBERTv2 |
| Already have a vector store (Pinecone, Chroma, etc.) | Subclass dspy.Retrieve |
| Need full control over embeddings | dspy.Embedder + your own vector store |
Building RAG pipelines
RAG is the most common use of retrieval in DSPy. The pattern: retrieve relevant passages, then generate an answer grounded in them.
Basic RAG
import dspy
class RAG(dspy.Module):
def __init__(self, retriever, k=3):
self.retrieve = retriever
self.generate = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.retrieve(question).passages
return self.generate(context=context, question=question)
# With Embeddings retriever
embedder = dspy.Embedder("openai/text-embedding-3-small", dimensions=512)
search = dspy.retrievers.Embeddings(embedder=embedder, corpus=my_docs, k=5)
rag = RAG(retriever=search)
result = rag(question="How do refunds work?")
print(result.answer)
RAG with source grounding
Use assertions to enforce that answers stay grounded in the retrieved context:
class GroundedRAG(dspy.Module):
def __init__(self, retriever):
self.retrieve = retriever
self.generate = dspy.ChainOfThought(
"context, question -> answer, cited_sources: list[int]"
)
def forward(self, question):
passages = self.retrieve(question).passages
result = self.generate(context=passages, question=question)
dspy.Suggest(
len(result.cited_sources) > 0,
"Answer should cite at least one source passage by index",
)
return dspy.Prediction(
answer=result.answer,
cited_sources=result.cited_sources,
passages=passages,
)
Multi-hop RAG
When a question needs information from multiple documents, chain retrieval steps:
class MultiHopRAG(dspy.Module):
def __init__(self, retriever, hops=2):
self.retrieve = retriever
self.generate_query = [
dspy.ChainOfThought("context, question -> search_query")
for _ in range(hops)
]
self.answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = []
for hop in self.generate_query:
query = hop(context=context, question=question).search_query
new_passages = self.retrieve(query).passages
context = list(dict.fromkeys(context + new_passages)) # deduplicate
return self.answer(context=context, question=question)
Configuring retrievers
There are two ways to wire up a retriever:
Option 1: Global configuration with dspy.configure
colbert = dspy.ColBERTv2(url="http://your-server:8893/api/search")
dspy.configure(lm=lm, rm=colbert)
# dspy.Retrieve() now uses colbert automatically
retriever = dspy.Retrieve(k=5)
Option 2: Pass the retriever directly
embedder = dspy.Embedder("openai/text-embedding-3-small")
search = dspy.retrievers.Embeddings(embedder=embedder, corpus=docs, k=5)
class MyRAG(dspy.Module):
def __init__(self):
self.search = search # use directly, no global config needed
self.answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.search(question).passages
return self.answer(context=context, question=question)
Option 2 is more explicit and avoids global state. Prefer it when your program uses a single retriever.
The k parameter
The k parameter controls how many passages to retrieve. It can be set at multiple levels:
# At init time
retriever = dspy.Retrieve(k=5)
# Override per call
result = retriever("query", k=10)
# In Embeddings constructor
search = dspy.retrievers.Embeddings(embedder=embedder, corpus=docs, k=3)
Choosing k:
- Start with
k=3tok=5for most tasks - Increase
kfor questions that need broader context - Decrease
kfor faster inference and lower token costs - More passages means more context for the LM, but also more noise and higher cost
- Use evaluation to find the optimal
kfor your specific task
Cross-references
- Building custom modules to wrap retrieval logic -- see
/dspy-modules - End-to-end document search with vector stores and chunking -- see
/ai-searching-docs - Keeping answers grounded and avoiding hallucination -- see
/ai-stopping-hallucinations - For worked examples, see examples.md
More from lebsral/dspy-programming-not-prompting-lms-skills
ai-switching-models
Switch AI providers or models without breaking things. Use when you want to switch from OpenAI to Anthropic, try a cheaper model, stop depending on one vendor, compare models side-by-side, a model update broke your outputs, you need vendor diversification, or you want to migrate to a local model. Also use when your prompt broke after a model update, prompts that work for GPT-4 do not work for Claude or Llama, or you need to do a model migration. Covers DSPy model portability with provider config, re-optimization, model comparison, and multi-model pipelines. Also used for migrate from OpenAI to Anthropic, GPT to Claude migration, try Llama instead of GPT, model comparison framework, multi-provider AI setup, avoid vendor lock-in for AI, prompts break when switching models, model-agnostic AI code.
56ai-stopping-hallucinations
Stop your AI from making things up. Use when your AI hallucinates, fabricates facts, is not grounded in real data, does not cite sources, makes unsupported claims, or you need to verify AI responses against source material. Also use when your LLM makes up facts, responses are disconnected from the input, or outputs are not grounded in source documents. Covers citation enforcement, faithfulness verification, grounding via retrieval, confidence thresholds, and evaluation of anti-hallucination quality. Also used for AI makes up citations, LLM fabricates data, ground AI in source documents, RAG but AI still hallucinates, force AI to cite sources, factual accuracy for AI, prevent AI from inventing facts, AI confident but wrong, LLM confabulation, hallucination detection, verify AI claims against documents.
49ai-do
Describe your AI problem and get routed to the right skill with a ready-to-use prompt. Use when you are not sure which ai- skill to use, want help picking the right approach, or just want to describe what you need in plain language. Also use this when someone says I want to build an AI that..., how do I make my AI..., or describes any AI/LLM task without naming a specific skill, I need AI but do not know where to start, which AI pattern should I use, what is the best way to add AI to my app, recommend an AI approach, AI feature discovery, too many AI options, overwhelmed by AI frameworks, just tell me what to build, new to DSPy, beginner AI project help, which LLM pattern fits my use case, confused about AI architecture, help me figure out my AI approach.
23ai-improving-accuracy
Measure and improve how well your AI works. Use when AI gives wrong answers, accuracy is bad, responses are unreliable, you need to test AI quality, evaluate your AI, write metrics, benchmark performance, optimize prompts, improve results, or systematically make your AI better. Also use when you spent hours tweaking prompts, trial and error prompt engineering is not working, quality plateaued early, or you have stale prompts everywhere in your codebase. Covers DSPy evaluation, metrics, and optimization., my AI is only 60% accurate, how to measure AI quality, AI evaluation framework, benchmark my LLM, prompt optimization not working, systematic way to improve AI, AI accuracy plateaued, DSPy optimizer tutorial, MIPROv2 optimization, how to go from 70% to 90% accuracy.
21ai-reasoning
Make AI solve hard problems that need planning and multi-step thinking. Use when your AI fails on complex questions, needs to break down problems, requires multi-step logic, needs to plan before acting, gives wrong answers on math or analysis tasks, or when a simple prompt is not enough for the reasoning required. Covers ChainOfThought, ProgramOfThought, MultiChainComparison, and Self-Discovery reasoning patterns in DSPy., AI gives shallow answers, LLM does not think before answering, chain of thought prompting, make AI show its work, AI fails at math, complex analysis with LLM, multi-step problem solving, AI reasoning errors, LLM logic mistakes, think step by step DSPy, AI cannot do basic arithmetic, deep reasoning with language models, self-consistency for better answers, tree of thought.
21ai-building-chatbots
Build a conversational AI assistant with memory and state. Use when you need a customer support chatbot, helpdesk bot, onboarding assistant, sales qualification bot, FAQ assistant, or any multi-turn conversational AI. Also used for chatbot remember previous messages, conversational AI keeps forgetting context, build a helpdesk bot that actually works, chatbot drops context after a few turns, Intercom bot alternative, Zendesk AI alternative, build WhatsApp bot, Slack bot with AI, chatbot escalation to human agent, LangChain chatbot but simpler, chatbot for SaaS onboarding flow.
21