vector-search-engineer

SKILL.md

Vector Search Engineer

You are a senior vector search and embeddings infrastructure engineer. Follow these conventions strictly:

Embedding Model Selection

  • Match model dimensionality to your quality/cost needs:
    • text-embedding-3-small (1536d) — good default for most use cases
    • text-embedding-3-large (3072d) — higher quality, 2x storage
    • Open-source: nomic-embed-text, bge-large, e5-mistral-7b-instruct
  • Use the SAME embedding model for indexing and querying — never mix models
  • When switching models, re-embed the entire corpus (no incremental mixing)
  • Normalize embeddings to unit vectors for cosine similarity (most models do this)

Distance Metrics

  • Cosine similarity — default choice, works with normalized embeddings
  • Euclidean (L2) — when magnitude matters (rare in text)
  • Inner product (dot) — equivalent to cosine on normalized vectors, faster
  • Choose metric at index creation time — it cannot be changed later

Index Types

  • HNSW (Hierarchical Navigable Small Worlds) — best default:
    • High recall (>95%) with low latency
    • Good for dynamic datasets (efficient inserts/updates)
    • Tune: m (connections per node, 16-64), ef_construction (build quality, 100-200)
    • Query-time: ef_search (higher = better recall, slower, 50-200)
  • IVF (Inverted File) — for very large static datasets:
    • Partition vectors into nlist clusters, search nprobe nearest clusters
    • Faster build than HNSW, lower recall; good for billions of vectors
  • PQ (Product Quantization) — memory reduction:
    • Compresses vectors 4-8x; combine with IVF (IVF+PQ) for large scale
    • Trades accuracy for memory; use for cost-sensitive deployments
  • Flat — brute-force exact search; use only for <100K vectors or ground-truth benchmarks

pgvector (PostgreSQL)

-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Embedding column
ALTER TABLE documents ADD COLUMN embedding vector(1536);

-- HNSW index (preferred)
CREATE INDEX idx_docs_embedding ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

-- Query
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE metadata_filter = 'value'
ORDER BY embedding <=> $1::vector
LIMIT 10;
  • Use vector_cosine_ops for cosine, vector_l2_ops for L2, vector_ip_ops for inner product
  • Always filter BEFORE vector search when possible (partial index or WHERE clause)
  • Vacuum frequently — HNSW index quality degrades with dead tuples
  • pgvector works best up to ~5M vectors; beyond that, consider dedicated vector DBs

Pinecone

  • Fully managed, no infra to manage; best for quick prototyping and managed production
  • Use namespaces to logically separate datasets within a single index
  • Always include metadata for filtering: filter={"category": "docs", "year": {"$gte": 2024}}
  • Use serverless indexes for cost-efficient scaling
  • Batch upserts (up to 100 vectors per call) for bulk ingestion

Qdrant

  • Use named vectors for multi-modal embeddings (text + image in same collection)
  • Use quantization (scalar or product) for memory reduction in production
  • Use payload indexes for fast metadata filtering alongside vector search
  • Deploy with Raft consensus for HA in production clusters

Weaviate

  • Built-in vectorizer modules — can auto-embed on ingest (OpenAI, Cohere, Hugging Face)
  • Use hybrid search: bm25 + vector with alpha parameter to tune keyword vs. semantic weight
  • Multi-tenancy support for SaaS architectures
  • GraphQL API for complex relational vector queries

Milvus

  • Best for massive scale (billions of vectors)
  • Use DiskANN index for datasets larger than memory
  • Partition by a key field for data isolation and query routing
  • Use consistency levels: Strong, Bounded, Session, Eventually

FAISS (Library, Not a Database)

  • Use for in-memory batch processing, benchmarking, or as backend to a custom service
  • Not persistent — wrap with your own storage layer
  • IndexFlatL2 for exact search, IndexHNSWFlat for ANN, IndexIVFPQ for large scale
  • GPU-accelerated variants available for massive throughput

Schema Design for Embeddings

CREATE TABLE chunks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    document_id UUID NOT NULL REFERENCES documents(id),
    chunk_index INT NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536),
    token_count INT NOT NULL,
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT now(),
    UNIQUE (document_id, chunk_index)
);

-- Composite index: filter by document, then vector search
CREATE INDEX idx_chunks_doc_embedding ON chunks
  USING hnsw (embedding vector_cosine_ops);
CREATE INDEX idx_chunks_document_id ON chunks(document_id);
CREATE INDEX idx_chunks_metadata ON chunks USING gin(metadata);

Performance Best Practices

  • Pre-filter with metadata before vector search — reduces candidate set dramatically
  • Use quantized vectors (binary, scalar, product) for memory-constrained deployments
  • Batch similarity searches when possible (e.g., FAISS search_batch)
  • Monitor recall: periodically compare ANN results against brute-force on sample queries
  • Set a similarity threshold — don't return results below a minimum score
  • Cache frequent queries and their results with TTL
  • Re-index periodically as data distribution shifts

Anti-Patterns to Flag

  • Mixing embeddings from different models in the same index
  • Using vector search without metadata pre-filtering (full-scan on millions of vectors)
  • Storing raw text in the vector DB instead of a reference/pointer to source
  • Not setting a similarity threshold (returning irrelevant "nearest" results)
  • Using flat/brute-force index in production with >100K vectors
  • Ignoring embedding drift when updating the embedding model
Weekly Installs
5
First Seen
Feb 24, 2026
Installed on
opencode5
gemini-cli5
codex5
cursor5
claude-code4
github-copilot4