hybrid-retrieval
SKILL.md
Hybrid Retrieval for RAG
Combine dense (semantic) and sparse (keyword) retrieval for superior results.
When to Use
- Vector search misses exact keyword matches
- Domain-specific terminology needs exact matching
- Users search with both natural language and specific terms
- Need to balance semantic understanding with precision
The Problem with Vector-Only Search
Query: "Error code E-4521 troubleshooting"
Vector search returns:
- "Common error handling patterns" (semantically similar)
- "Debugging techniques for applications" (related topic)
Missing:
- "E-4521: Database connection timeout" (exact match needed!)
Hybrid Architecture
┌─────────────────────────────────────────────────┐
│ User Query │
└─────────────────────┬───────────────────────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Dense Search │ │ Sparse Search │
│ (Embeddings) │ │ (BM25/TF-IDF) │
└────────┬────────┘ └────────┬────────┘
│ │
└────────────┬────────────┘
│
▼
┌───────────────┐
│ Fusion │
│ (RRF/Linear) │
└───────┬───────┘
│
▼
┌───────────────┐
│ Reranker │
│ (Optional) │
└───────┬───────┘
│
▼
┌───────────────┐
│ Final Results │
└───────────────┘
Implementation
Basic Hybrid with LangChain
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma
# Dense retriever (vector search)
vectorstore = Chroma.from_documents(docs, embeddings)
dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
# Sparse retriever (BM25)
bm25_retriever = BM25Retriever.from_documents(docs)
bm25_retriever.k = 10
# Combine with ensemble
hybrid_retriever = EnsembleRetriever(
retrievers=[dense_retriever, bm25_retriever],
weights=[0.5, 0.5] # Adjust based on your data
)
results = hybrid_retriever.invoke("Error code E-4521")
Reciprocal Rank Fusion (RRF)
def reciprocal_rank_fusion(results_list: list, k: int = 60) -> list:
"""
Combine multiple ranked lists using RRF.
k=60 is the standard constant from the original paper.
"""
fused_scores = {}
for results in results_list:
for rank, doc in enumerate(results):
doc_id = doc.metadata.get("id", hash(doc.page_content))
if doc_id not in fused_scores:
fused_scores[doc_id] = {"doc": doc, "score": 0}
fused_scores[doc_id]["score"] += 1 / (k + rank + 1)
# Sort by fused score
reranked = sorted(
fused_scores.values(),
key=lambda x: x["score"],
reverse=True
)
return [item["doc"] for item in reranked]
# Usage
dense_results = dense_retriever.invoke(query)
sparse_results = bm25_retriever.invoke(query)
final_results = reciprocal_rank_fusion([dense_results, sparse_results])
With Pinecone (Native Hybrid)
from pinecone import Pinecone
from pinecone_text.sparse import BM25Encoder
# Initialize
pc = Pinecone(api_key="...")
index = pc.Index("hybrid-index")
# Sparse encoder
bm25 = BM25Encoder()
bm25.fit(corpus)
# Query with both dense and sparse
def hybrid_query(query: str, alpha: float = 0.5):
# Dense vector
dense_vec = embeddings.embed_query(query)
# Sparse vector
sparse_vec = bm25.encode_queries([query])[0]
# Hybrid search
results = index.query(
vector=dense_vec,
sparse_vector=sparse_vec,
top_k=10,
alpha=alpha, # 0 = sparse only, 1 = dense only
include_metadata=True
)
return results
With Weaviate (Native Hybrid)
import weaviate
client = weaviate.Client("http://localhost:8080")
result = client.query.get(
"Document",
["content", "title"]
).with_hybrid(
query="Error code E-4521",
alpha=0.5, # Balance between vector and keyword
fusion_type="rankedFusion"
).with_limit(10).do()
Adding a Reranker
from sentence_transformers import CrossEncoder
# Load reranker model
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
def rerank_results(query: str, docs: list, top_k: int = 5) -> list:
"""Rerank documents using cross-encoder."""
pairs = [[query, doc.page_content] for doc in docs]
scores = reranker.predict(pairs)
# Sort by reranker scores
scored_docs = list(zip(docs, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)
return [doc for doc, score in scored_docs[:top_k]]
# Full pipeline
hybrid_results = hybrid_retriever.invoke(query) # Get 20 results
final_results = rerank_results(query, hybrid_results, top_k=5) # Rerank to top 5
Weight Tuning Guidelines
| Data Type | Dense Weight | Sparse Weight | Notes |
|---|---|---|---|
| General text | 0.5 | 0.5 | Balanced default |
| Technical docs | 0.4 | 0.6 | Keywords matter more |
| Conversational | 0.7 | 0.3 | Semantic matters more |
| Code/APIs | 0.3 | 0.7 | Exact matches critical |
| Legal/Medical | 0.4 | 0.6 | Terminology precision |
Evaluation
def evaluate_retrieval(queries: list, ground_truth: dict, retriever) -> dict:
"""Calculate retrieval metrics."""
metrics = {"mrr": 0, "recall@5": 0, "precision@5": 0}
for query in queries:
results = retriever.invoke(query)
result_ids = [doc.metadata["id"] for doc in results[:5]]
relevant_ids = ground_truth[query]
# MRR
for i, rid in enumerate(result_ids):
if rid in relevant_ids:
metrics["mrr"] += 1 / (i + 1)
break
# Recall & Precision
hits = len(set(result_ids) & set(relevant_ids))
metrics["recall@5"] += hits / len(relevant_ids)
metrics["precision@5"] += hits / 5
# Average
n = len(queries)
return {k: v/n for k, v in metrics.items()}
Best Practices
- Start with 50/50 weights - then tune based on evaluation
- Always add a reranker - significant quality improvement
- Index sparse vectors - BM25 on raw text, not chunks
- Use native hybrid - when available (Pinecone, Weaviate, Qdrant)
- Monitor both paths - log which retriever contributed to final results
Weekly Installs
13
Repository
latestaiagents/…t-skillsGitHub Stars
2
First Seen
Feb 5, 2026
Security Audits
Installed on
gemini-cli13
github-copilot13
codex13
kimi-cli13
amp13
cline13