Hybrid Memory Skill

A self-contained hybrid memory system inspired by ZeroClaw's architecture. Combines vector embeddings (semantic search) with FTS5 keyword search (BM25 scoring) for powerful memory recall.

Purpose

Give agents persistent memory with:

Semantic search - Find memories by meaning, not just keywords
Keyword search - Traditional BM25 text search via SQLite FTS5
Hybrid scoring - Weighted combination of vector and keyword similarity
Zero external dependencies - Pure SQLite + local embeddings

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Hybrid Memory                          │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │   Markdown   │  │   Chunker    │  │  Embedding API   │  │
│  │    Input     │ ─│>  (preserve  │─>│  (configurable)  │  │
│  │              │  │   headings)  │  │                  │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
│           │                                    │            │
│           ▼                                    ▼            │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                 SQLite Database                       │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐ │  │
│  │  │ memories    │  │fts5 memories│  │embedding_cache│ │  │
│  │  │(id,content, │  │(content,    │  │(hash,vector)  │ │  │
│  │  │ vector_blob)│  │  metadata)   │  │               │ │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘ │  │
│  └──────────────────────────────────────────────────────┘  │
│                              │                              │
│                              ▼                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │              Hybrid Search Engine                     │  │
│  │         (cosine similarity + BM25 merge)              │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Setup

cd /job/.pi/skills/hybrid-memory
npm install

Configuration

Set in your environment or .env:

# Required: OpenAI API key for embeddings (or use local embedding model)
OPENAI_API_KEY=sk-...

# Optional: Embedding model (default: text-embedding-3-small)
EMBEDDING_MODEL=text-embedding-3-small

# Optional: Vector weight in hybrid scoring (0-1, default: 0.7)
HYBRID_VECTOR_WEIGHT=0.7

# Optional: Database path (default: /job/data/hybrid-memory.db)
HYBRID_MEMORY_DB_PATH=/job/data/hybrid-memory.db

Commands

Initialize Database

memory-init

Creates SQLite database with:

memories table (id, content, vector_blob, metadata, created_at)
memories_fts FTS5 virtual table for full-text search
embedding_cache table for LRU caching

Store Memory

memory-store "Your memory content here" --tags project,meeting
memory-store -f /path/to/file.md --source "Documentation"

Options:

--tags - Comma-separated tags for filtering
--source - Source attribution
--id - Custom memory ID

Search Memories

# Semantic search (vector only)
memory-search "authentication middleware"

# Keyword search (BM25)
memory-search "authentication middleware" --mode keyword

# Hybrid search (default)
memory-search "authentication middleware" --mode hybrid

# With filters
memory-search "deployment" --tags production --limit 10

Recall (Semantic + Contextual)

# Find most relevant memories for current context
memory-recall "How do I configure the database?"

# Top-K with threshold
memory-recall "error handling" --top-k 5 --threshold 0.7

Memory Management

# List recent memories
memory-list --limit 20

# Delete memory
memory-delete <memory-id>

# Export memories
memory-export --format json > memories.json
memory-export --format markdown > memories.md

# Get stats
memory-stats

Tools Added

When this skill is active, the agent gains access to:

`memory_recall`

Recall relevant memories based on query context.

// Use in agent prompt for contextual memory
memory_recall({
  query: "How did I implement authentication?",
  top_k: 5,
  threshold: 0.6
})

`memory_store`

Store new memories with automatic embedding.

memory_store({
  content: "User prefers TypeScript over JavaScript",
  tags: ["preferences", "user-profile"],
  source: "conversation"
})

`memory_search`

Search stored memories.

memory_search({
  query: "database configuration",
  mode: "hybrid",  // "vector", "keyword", "hybrid"
  limit: 10
})

Usage in Agent Prompt

When this skill is active, include this context:

## Memory System

You have access to a hybrid memory system (vector + keyword search) via:

- `memory_recall(query, top_k?, threshold?)` - Recall relevant context
- `memory_store(content, tags?, source?)` - Store new memories
- `memory_search(query, mode?, limit?)` - Search memories

Use memory_recall() to:
- Remember previous conversations
- Find relevant code patterns
- Access stored documentation
- Maintain context across sessions

Use memory_store() to:
- Save user preferences
- Store discovered patterns
- Remember decisions made
- Document important findings

The hybrid search combines:
- Semantic similarity (70% weight): Finds by meaning
- BM25 keyword scoring (30% weight): Finds by exact words

Technical Details

Embedding Strategy

Chunking: Paragraph-level with heading preservation
Model: OpenAI text-embedding-3-small (1536 dimensions)
Normalization: L2-normalized for cosine similarity
Caching: Hash-based embedding cache (SHA-256 of content)

Hybrid Scoring

final_score = (vector_weight * vector_score) + 
              (keyword_weight * bm25_score_normalized)

Example weights:
- vector_weight = 0.7 (semantic meaning)
- keyword_weight = 0.3 (exact word matches)

Similarity Search

Vector similarity uses cosine distance on normalized embeddings:

SELECT id, content, 
       (vector_dot_product(embedding, :query_vec)) as similarity
FROM memories
ORDER BY similarity DESC
LIMIT :limit

FTS5 BM25 Scoring

SELECT rowid, bm25(memories_fts, 1.2, 0.75) as score
FROM memories_fts
WHERE memories_fts MATCH :query
ORDER BY score DESC

File Structure

.pi/skills/hybrid-memory/
├── SKILL.md                 # This file
├── package.json
├── lib/
│   ├── db.js               # SQLite connection + schema
│   ├── embeddings.js       # OpenAI embedding client
│   ├── chunker.js          # Text chunking logic
│   ├── search.js           # Vector + keyword + hybrid search
│   └── cache.js            # Embedding cache management
├── bin/
│   ├── memory-init.js      # Initialize database
│   ├── memory-store.js     # Store memories CLI
│   ├── memory-search.js    # Search memories CLI
│   ├── memory-recall.js    # Recall with context
│   ├── memory-list.js      # List memories
│   ├── memory-delete.js    # Delete memory
│   ├── memory-export.js    # Export to various formats
│   └── memory-stats.js     # Database statistics
└── tests/
    └── memory.test.js      # Test suite

Performance Characteristics

Metric	Expected
Embedding latency	~100-300ms (OpenAI API)
Vector search	<10ms for 10k memories
Keyword search	<50ms for 10k memories
Storage per memory	~6KB (1536 dims * 4 bytes)
Database size	~60MB for 10k memories

When to Use

Long-running conversations - Maintain context across sessions
Code knowledge bases - Remember patterns and decisions
Documentation search - Semantic doc retrieval
User preferences - Remember how users like things done
Research accumulation - Build knowledge over time

Integration Example

// In agent workflow:

// 1. Before answering, recall relevant context
const relevant = memory_recall({
  query: userQuestion,
  top_k: 3
});

// 2. Include context in LLM prompt
const prompt = `
  Previous relevant context:
  ${relevant.map(m =>m.content).join('\n---\n')}
  
  User question: ${userQuestion}
`;

// 3. After answering, store the exchange
memory_store({
  content: `Q: ${userQuestion}\nA: ${answer}`,
  tags: ["conversation", "q-and-a"],
  source: "agent-session"
});

Inspiration

This skill is inspired by ZeroClaw's memory architecture:

"Full-Stack Memory System" - Zero overhead, custom implementation
"Hybrid Merge" - Weighted combination of vector + keyword
"Safe Reindex" - Atomic rebuilds with no downtime

hybrid-memory