skills/constructive-io/constructive-skills/pgvector-similarity-search

pgvector-similarity-search

SKILL.md

pgvector Similarity Search

Query vector embeddings using pgvector's similarity operators. This skill covers the retrieval phase of RAG pipelines.

When to Apply

Use this skill when:

  • Finding semantically similar documents or chunks
  • Implementing the retrieval step of RAG
  • Building semantic search features
  • Querying embeddings stored in PostgreSQL

Distance Operators

pgvector supports three distance metrics:

Operator Distance Type Use Case
<=> Cosine distance Most common, normalized vectors
<-> Euclidean (L2) When magnitude matters
<#> Inner product Dot product similarity

Cosine distance is recommended for text embeddings as it measures angle between vectors, ignoring magnitude.

Basic Similarity Search

Direct Query

SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM intelligence.chunks
WHERE embedding IS NOT NULL
ORDER BY embedding <=> $1::vector
LIMIT 5;

The 1 - distance converts cosine distance to similarity (0 to 1 scale).

With Threshold

Filter results below a similarity threshold:

SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM intelligence.chunks
WHERE embedding IS NOT NULL
  AND 1 - (embedding <=> $1::vector) > 0.7
ORDER BY embedding <=> $1::vector
LIMIT 5;

Similarity Search Function

Create a reusable PostgreSQL function:

-- deploy/schemas/intelligence/procedures/find_similar_chunks.sql
CREATE FUNCTION intelligence.find_similar_chunks(
    p_embedding VECTOR(768),
    p_limit INTEGER DEFAULT 5,
    p_similarity_threshold FLOAT DEFAULT 0.7
)
RETURNS TABLE (
    id INTEGER,
    content TEXT,
    similarity FLOAT
) AS $$
BEGIN
    RETURN QUERY
    SELECT 
        c.id,
        c.content,
        1 - (c.embedding <=> p_embedding) AS similarity
    FROM intelligence.chunks c
    WHERE c.embedding IS NOT NULL
      AND 1 - (c.embedding <=> p_embedding) > p_similarity_threshold
    ORDER BY c.embedding <=> p_embedding
    LIMIT p_limit;
END;
$$ LANGUAGE plpgsql;

Revert Script

-- revert/schemas/intelligence/procedures/find_similar_chunks.sql
DROP FUNCTION IF EXISTS intelligence.find_similar_chunks(VECTOR(768), INTEGER, FLOAT);

TypeScript Implementation

import { Pool } from 'pg';
import { OllamaClient } from './utils/ollama';

const formatVector = (embedding: number[]): string => `[${embedding.join(',')}]`;

export class SimilaritySearch {
  private pool: Pool;
  private ollama: OllamaClient;

  constructor(pool: Pool, ollamaBaseUrl?: string) {
    this.pool = pool;
    this.ollama = new OllamaClient(ollamaBaseUrl);
  }

  async findSimilar(
    query: string,
    limit: number = 5,
    threshold: number = 0.7
  ): Promise<Array<{ id: number; content: string; similarity: number }>> {
    // Generate embedding for the query
    const queryEmbedding = await this.ollama.generateEmbedding(query);

    // Search for similar chunks
    const result = await this.pool.query(
      `SELECT id, content, similarity
       FROM intelligence.find_similar_chunks($1::vector, $2, $3)
       ORDER BY similarity DESC`,
      [formatVector(queryEmbedding), limit, threshold]
    );

    return result.rows;
  }

  async getContext(query: string, limit: number = 5): Promise<string> {
    const chunks = await this.findSimilar(query, limit);
    return chunks.map(c => c.content).join('\n\n');
  }
}

Aggregating Context

For RAG, combine retrieved chunks into a single context string:

SELECT string_agg(content, E'\n\n') as context
FROM intelligence.find_similar_chunks($1::vector, $2, $3);

In TypeScript:

async function getRAGContext(query: string, pool: Pool, ollama: OllamaClient): Promise<string> {
  const queryEmbedding = await ollama.generateEmbedding(query);

  const result = await pool.query(
    `SELECT string_agg(content, E'\n\n') as context
     FROM intelligence.find_similar_chunks($1::vector, $2)`,
    [formatVector(queryEmbedding), 5]
  );

  return result.rows[0].context || '';
}

Testing Similarity Search

import { getConnections, PgTestClient } from 'pgsql-test';
import { OllamaClient } from '../src/utils/ollama';

let pg: PgTestClient;
let teardown: () => Promise<void>;
let ollama: OllamaClient;

const formatVector = (embedding: number[]): string => `[${embedding.join(',')}]`;

beforeAll(async () => {
  ({ pg, teardown } = await getConnections());
  ollama = new OllamaClient();
});

afterAll(() => teardown());

test('should find semantically similar chunks', async () => {
  // Seed document about machine learning
  const mlContent = 'Machine learning enables systems to learn from data.';
  const mlEmbedding = await ollama.generateEmbedding(mlContent);

  await pg.client.query(
    `INSERT INTO intelligence.documents (title, content, embedding)
     VALUES ($1, $2, $3::vector)
     RETURNING id`,
    ['ML Basics', mlContent, formatVector(mlEmbedding)]
  );

  // Create chunk with embedding
  const docResult = await pg.client.query('SELECT id FROM intelligence.documents LIMIT 1');
  const docId = docResult.rows[0].id;

  await pg.client.query(
    `INSERT INTO intelligence.chunks (document_id, content, embedding, chunk_index)
     VALUES ($1, $2, $3::vector, 0)`,
    [docId, mlContent, formatVector(mlEmbedding)]
  );

  // Query for similar content
  const query = 'How do systems learn from data?';
  const queryEmbedding = await ollama.generateEmbedding(query);

  const results = await pg.client.query(
    `SELECT content, similarity
     FROM intelligence.find_similar_chunks($1::vector, 5, 0.3)
     ORDER BY similarity DESC`,
    [formatVector(queryEmbedding)]
  );

  expect(results.rows.length).toBeGreaterThan(0);
  expect(results.rows[0].similarity).toBeGreaterThan(0.3);
  expect(results.rows[0].content).toContain('Machine learning');
});

Performance Optimization

Add Indexes

For large datasets, add an index after initial data load:

-- IVFFlat index (good balance)
CREATE INDEX idx_chunks_embedding ON intelligence.chunks 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- HNSW index (better recall, more memory)
CREATE INDEX idx_chunks_embedding_hnsw ON intelligence.chunks 
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Set Search Parameters

For IVFFlat, increase probes for better recall:

SET ivfflat.probes = 10;

For HNSW, adjust ef_search:

SET hnsw.ef_search = 100;

Filtering with Metadata

Combine vector search with metadata filters:

SELECT c.id, c.content, 1 - (c.embedding <=> $1::vector) AS similarity
FROM intelligence.chunks c
JOIN intelligence.documents d ON c.document_id = d.id
WHERE c.embedding IS NOT NULL
  AND d.metadata->>'category' = 'technical'
  AND 1 - (c.embedding <=> $1::vector) > 0.7
ORDER BY c.embedding <=> $1::vector
LIMIT 5;

Similarity Thresholds

Recommended thresholds by use case:

Use Case Threshold Notes
Strict matching 0.8+ High precision, may miss relevant results
General search 0.6-0.7 Good balance
Exploratory 0.4-0.5 High recall, more noise
RAG context 0.5-0.7 Depends on document quality

Troubleshooting

Issue Solution
No results returned Lower the similarity threshold
Irrelevant results Raise the threshold or improve embeddings
Slow queries Add IVFFlat or HNSW index
"Operator does not exist" Ensure pgvector extension is installed
Dimension mismatch Query vector must match stored vector dimensions

References

  • Related skill: pgvector-setup for database schema setup
  • Related skill: pgvector-embeddings for generating embeddings
  • Related skill: rag-pipeline for complete RAG implementation
  • pgvector documentation
Weekly Installs
7
First Seen
Feb 27, 2026
Installed on
windsurf7
mcpjam6
claude-code6
junie6
kilo6
zencoder6