vector-database-ops
SKILL.md
Vector Database Operations
Run production vector databases for AI-powered search, RAG, and recommendation systems.
When to Use This Skill
Use this skill when:
- Setting up a vector database for a RAG or semantic search application
- Choosing between Qdrant, Weaviate, pgvector, or Pinecone
- Managing collections, indexes, and data migrations
- Optimizing query performance and indexing for production loads
- Implementing multi-tenant vector search with namespace isolation
Vector Database Comparison
| Database | Best For | Hosting | Filtering | Scale |
|---|---|---|---|---|
| Qdrant | High-performance, rich filtering, self-hosted | Self / Cloud | Excellent | Very High |
| Weaviate | Schema-first, hybrid search, multi-modal | Self / Cloud | Good | High |
| pgvector | Already on Postgres, simple use cases | Self | Good | Medium |
| Pinecone | Zero-ops managed, serverless | Managed only | Good | Very High |
| Chroma | Local dev, prototyping | Self only | Basic | Low-Medium |
Qdrant — Production Deployment
# Docker (single node)
docker run -d \
--name qdrant \
-p 6333:6333 \
-p 6334:6334 \
-v $(pwd)/qdrant-data:/qdrant/storage \
qdrant/qdrant:latest
# With custom config
docker run -d \
--name qdrant \
-p 6333:6333 \
-v $(pwd)/qdrant-data:/qdrant/storage \
-v $(pwd)/qdrant-config.yaml:/qdrant/config/production.yaml \
qdrant/qdrant:latest
# qdrant-config.yaml
storage:
storage_path: /qdrant/storage
on_disk_payload: true # store payload on disk (saves RAM)
service:
max_request_size_mb: 32
hnsw_index:
m: 16 # graph connections per node
ef_construct: 100 # accuracy vs build time trade-off
full_scan_threshold: 10000 # switch to brute force below this
quantization:
scalar:
type: int8
quantile: 0.99
always_ram: true # keep quantized index in RAM
telemetry_disabled: true
Qdrant Collection Management
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, HnswConfigDiff,
ScalarQuantizationConfig, ScalarType, QuantizationConfig
)
client = QdrantClient("http://localhost:6333")
# Create optimized collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=1536, # OpenAI ada-002 / text-embedding-3-small
distance=Distance.COSINE,
on_disk=True, # save RAM — vectors stored on disk
),
hnsw_config=HnswConfigDiff(
m=32, # higher = better recall, more RAM
ef_construct=200,
on_disk=False, # keep HNSW graph in RAM for speed
),
quantization_config=QuantizationConfig(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8,
quantile=0.99,
always_ram=True,
)
),
)
# Create payload index for fast filtering
client.create_payload_index(
collection_name="documents",
field_name="tenant_id",
field_schema="keyword",
)
client.create_payload_index(
collection_name="documents",
field_name="created_at",
field_schema="datetime",
)
# Collection info
info = client.get_collection("documents")
print(f"Vectors: {info.vectors_count}, Status: {info.status}")
Qdrant Filtered Search
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
# Tenant-isolated search (multi-tenant RAG)
results = client.query_points(
collection_name="documents",
query=query_embedding,
query_filter=Filter(
must=[
FieldCondition(key="tenant_id", match=MatchValue(value="acme-corp")),
FieldCondition(key="doc_type", match=MatchValue(value="contract")),
],
should=[
FieldCondition(key="created_at", range=Range(gte="2024-01-01")),
],
),
limit=10,
with_payload=True,
)
pgvector — PostgreSQL Extension
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table with vector column
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding VECTOR(1536),
metadata JSONB DEFAULT '{}',
tenant_id TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create HNSW index (faster queries, more memory)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Create IVFFlat index (less memory, slower build)
-- CREATE INDEX ON documents
-- USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100);
-- Semantic search with metadata filtering
SELECT id, content, metadata,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id = 'acme-corp'
AND metadata->>'doc_type' = 'contract'
ORDER BY embedding <=> $1::vector
LIMIT 10;
# Deploy pgvector via Docker
docker run -d \
--name pgvector \
-e POSTGRES_PASSWORD=secret \
-e POSTGRES_DB=vectordb \
-p 5432:5432 \
-v pgvector-data:/var/lib/postgresql/data \
pgvector/pgvector:pg16
Weaviate Deployment
# docker-compose for Weaviate
services:
weaviate:
image: semitechnologies/weaviate:latest
ports:
- "8080:8080"
- "50051:50051"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "false"
AUTHENTICATION_APIKEY_ENABLED: "true"
AUTHENTICATION_APIKEY_ALLOWED_KEYS: "${WEAVIATE_API_KEY}"
AUTHENTICATION_APIKEY_USERS: "admin"
PERSISTENCE_DATA_PATH: /var/lib/weaviate
ENABLE_MODULES: text2vec-openai,generative-openai
OPENAI_APIKEY: "${OPENAI_API_KEY}"
CLUSTER_HOSTNAME: node1
volumes:
- weaviate-data:/var/lib/weaviate
restart: unless-stopped
volumes:
weaviate-data:
Backup and Restore
# Qdrant — snapshot backup
curl -X POST "http://localhost:6333/collections/documents/snapshots"
# Download snapshot
curl -O "http://localhost:6333/collections/documents/snapshots/documents-snapshot.snapshot"
# Restore
curl -X POST "http://localhost:6333/collections/documents/snapshots/recover" \
-H "Content-Type: application/json" \
-d '{"location": "/qdrant/snapshots/documents-snapshot.snapshot"}'
# pgvector — standard pg_dump
pg_dump -h localhost -U postgres -d vectordb \
--table=documents --format=custom > documents-backup.dump
# Restore
pg_restore -h localhost -U postgres -d vectordb documents-backup.dump
Performance Tuning
# Qdrant — optimize collection after bulk load
client.update_collection(
collection_name="documents",
optimizer_config={"indexing_threshold": 0}, # force indexing now
)
# Wait for optimization to complete
import time
while True:
info = client.get_collection("documents")
if info.status.value == "green":
break
time.sleep(5)
print(f"Optimizing... segments: {info.segments_count}")
Common Issues
| Issue | Cause | Fix |
|---|---|---|
| Slow queries | No HNSW index built yet | Wait for indexing; check status == green |
| High RAM usage | Vectors in memory | Enable on_disk=True for vectors |
| Poor recall | Low ef search param |
Increase ef in search request (at query time) |
| pgvector slow | Using IVFFlat without vacuum | Run VACUUM ANALYZE documents |
| Weaviate OOM | Too many objects | Enable async indexing; increase heap |
Best Practices
- Use cosine distance for normalized embeddings; dot product for unnormalized.
- Always create payload indexes on filter fields (
tenant_id,doc_type). - For datasets >10M vectors, use
on_diskvectors +always_ramquantization. - Benchmark with your actual query patterns before choosing IVFFlat vs HNSW.
- Snapshot before any bulk delete or migration operation.
Related Skills
- rag-infrastructure - Full RAG pipeline
- databases - General database management
- postgresql - pgvector host database ops
Weekly Installs
2
Repository
bagelhole/devop…t-skillsGitHub Stars
13
First Seen
4 days ago
Security Audits
Installed on
opencode2
antigravity2
claude-code2
github-copilot2
codex2
zencoder2