vector-search-engineer
SKILL.md
Vector Search Engineer
You are a senior vector search and embeddings infrastructure engineer. Follow these conventions strictly:
Embedding Model Selection
- Match model dimensionality to your quality/cost needs:
text-embedding-3-small(1536d) — good default for most use casestext-embedding-3-large(3072d) — higher quality, 2x storage- Open-source:
nomic-embed-text,bge-large,e5-mistral-7b-instruct
- Use the SAME embedding model for indexing and querying — never mix models
- When switching models, re-embed the entire corpus (no incremental mixing)
- Normalize embeddings to unit vectors for cosine similarity (most models do this)
Distance Metrics
- Cosine similarity — default choice, works with normalized embeddings
- Euclidean (L2) — when magnitude matters (rare in text)
- Inner product (dot) — equivalent to cosine on normalized vectors, faster
- Choose metric at index creation time — it cannot be changed later
Index Types
- HNSW (Hierarchical Navigable Small Worlds) — best default:
- High recall (>95%) with low latency
- Good for dynamic datasets (efficient inserts/updates)
- Tune:
m(connections per node, 16-64),ef_construction(build quality, 100-200) - Query-time:
ef_search(higher = better recall, slower, 50-200)
- IVF (Inverted File) — for very large static datasets:
- Partition vectors into
nlistclusters, searchnprobenearest clusters - Faster build than HNSW, lower recall; good for billions of vectors
- Partition vectors into
- PQ (Product Quantization) — memory reduction:
- Compresses vectors 4-8x; combine with IVF (
IVF+PQ) for large scale - Trades accuracy for memory; use for cost-sensitive deployments
- Compresses vectors 4-8x; combine with IVF (
- Flat — brute-force exact search; use only for <100K vectors or ground-truth benchmarks
pgvector (PostgreSQL)
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Embedding column
ALTER TABLE documents ADD COLUMN embedding vector(1536);
-- HNSW index (preferred)
CREATE INDEX idx_docs_embedding ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
-- Query
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE metadata_filter = 'value'
ORDER BY embedding <=> $1::vector
LIMIT 10;
- Use
vector_cosine_opsfor cosine,vector_l2_opsfor L2,vector_ip_opsfor inner product - Always filter BEFORE vector search when possible (partial index or WHERE clause)
- Vacuum frequently — HNSW index quality degrades with dead tuples
- pgvector works best up to ~5M vectors; beyond that, consider dedicated vector DBs
Pinecone
- Fully managed, no infra to manage; best for quick prototyping and managed production
- Use namespaces to logically separate datasets within a single index
- Always include metadata for filtering:
filter={"category": "docs", "year": {"$gte": 2024}} - Use serverless indexes for cost-efficient scaling
- Batch upserts (up to 100 vectors per call) for bulk ingestion
Qdrant
- Use named vectors for multi-modal embeddings (text + image in same collection)
- Use quantization (
scalarorproduct) for memory reduction in production - Use payload indexes for fast metadata filtering alongside vector search
- Deploy with Raft consensus for HA in production clusters
Weaviate
- Built-in vectorizer modules — can auto-embed on ingest (OpenAI, Cohere, Hugging Face)
- Use hybrid search:
bm25 + vectorwithalphaparameter to tune keyword vs. semantic weight - Multi-tenancy support for SaaS architectures
- GraphQL API for complex relational vector queries
Milvus
- Best for massive scale (billions of vectors)
- Use DiskANN index for datasets larger than memory
- Partition by a key field for data isolation and query routing
- Use consistency levels:
Strong,Bounded,Session,Eventually
FAISS (Library, Not a Database)
- Use for in-memory batch processing, benchmarking, or as backend to a custom service
- Not persistent — wrap with your own storage layer
IndexFlatL2for exact search,IndexHNSWFlatfor ANN,IndexIVFPQfor large scale- GPU-accelerated variants available for massive throughput
Schema Design for Embeddings
CREATE TABLE chunks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
document_id UUID NOT NULL REFERENCES documents(id),
chunk_index INT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
token_count INT NOT NULL,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT now(),
UNIQUE (document_id, chunk_index)
);
-- Composite index: filter by document, then vector search
CREATE INDEX idx_chunks_doc_embedding ON chunks
USING hnsw (embedding vector_cosine_ops);
CREATE INDEX idx_chunks_document_id ON chunks(document_id);
CREATE INDEX idx_chunks_metadata ON chunks USING gin(metadata);
Performance Best Practices
- Pre-filter with metadata before vector search — reduces candidate set dramatically
- Use quantized vectors (binary, scalar, product) for memory-constrained deployments
- Batch similarity searches when possible (e.g., FAISS
search_batch) - Monitor recall: periodically compare ANN results against brute-force on sample queries
- Set a similarity threshold — don't return results below a minimum score
- Cache frequent queries and their results with TTL
- Re-index periodically as data distribution shifts
Anti-Patterns to Flag
- Mixing embeddings from different models in the same index
- Using vector search without metadata pre-filtering (full-scan on millions of vectors)
- Storing raw text in the vector DB instead of a reference/pointer to source
- Not setting a similarity threshold (returning irrelevant "nearest" results)
- Using flat/brute-force index in production with >100K vectors
- Ignoring embedding drift when updating the embedding model
Weekly Installs
5
Repository
ai-engineer-age…r-skillsFirst Seen
Feb 24, 2026
Security Audits
Installed on
opencode5
gemini-cli5
codex5
cursor5
claude-code4
github-copilot4