grey-haven-memory-systems
Memory Systems Skill
Design and implement long-term memory systems for AI agents.
The Context-Memory Spectrum
Memory exists on a spectrum from ephemeral to permanent:
Ephemeral ◄────────────────────────────────────► Permanent
Context Window Short-term Long-term Knowledge
(disappears) Cache Memory Base
(session) (weeks) (forever)
When to Use What
| Memory Type | Duration | Use Case |
|---|---|---|
| Context window | Single turn | Immediate task context |
| Short-term cache | Session | Conversation history |
| Long-term memory | Weeks/months | User preferences, learnings |
| Knowledge base | Permanent | Facts, documentation, procedures |
Memory Architecture Options
1. Vector RAG (Retrieval-Augmented Generation)
Store embeddings, retrieve by semantic similarity.
Pros:
- Simple to implement
- Works well for document retrieval
- Scales to millions of documents
Cons:
- No relationships between items
- Recency bias (older memories fade)
- Can retrieve irrelevant but similar content
Best for: Document search, FAQ systems, code search
2. Knowledge Graphs
Store entities and relationships explicitly.
Pros:
- Captures relationships
- Supports reasoning
- No similarity confusion
Cons:
- Complex to build and maintain
- Requires structured data
- More expensive queries
Best for: Domain modeling, reasoning tasks, complex queries
3. Temporal Knowledge Graphs
Knowledge graphs with time-based relationships.
Pros:
- Tracks how knowledge evolves
- Supports "as of" queries
- Captures causality
Cons:
- Most complex option
- Storage grows over time
- Query complexity
Best for: Historical analysis, change tracking, audit trails
4. Hybrid Approaches
Combine vector + graph for best of both:
Query ──▶ Vector Search ──▶ Top K candidates
│
▼
Graph Traversal ──▶ Related entities
│
▼
Re-ranking ──▶ Final results
Performance Benchmarks
Research benchmarks for memory systems (2024 data):
| System | Recall@10 | Latency (P50) | Cost/Query |
|---|---|---|---|
| Zep | 94.8% | 45ms | $0.0001 |
| MemGPT | 93.4% | 120ms | $0.0003 |
| LangChain Memory | 87.2% | 80ms | $0.0002 |
| Simple RAG | 78.5% | 30ms | $0.00005 |
Key Insights
- Zep excels at conversation memory with entity extraction
- MemGPT best for complex reasoning over memory
- Simple RAG sufficient for most document retrieval
- Hybrid approaches win for complex queries
What's Included
Examples (examples/)
- Conversation memory - Storing and retrieving chat history
- Entity memory - Tracking entities mentioned in conversations
- Knowledge base integration - Connecting to Grey Haven KB
Reference Guides (reference/)
- Architecture patterns - When to use each memory type
- Embedding strategies - Chunking, models, dimensions
- Grey Haven integration - Using with knowledge-base agents
Checklists (checklists/)
- Memory system selection - Choose the right architecture
- Implementation checklist - Before deploying memory
Grey Haven Knowledge Base Agents
This skill complements the knowledge-base agents:
| Agent | Purpose |
|---|---|
memory-architect |
Design memory storage, semantic search |
knowledge-curator |
Create and organize knowledge entries |
ontology-builder |
Map relationships between entries |
kb-search-analyzer |
Search and synthesize from KB |
kb-entry-creator |
Create structured KB entries |
kb-validator |
Validate KB integrity |
kb-manifest-generator |
Generate KB indexes |
kb-ontology-mapper |
Visualize knowledge structure |
Implementation Patterns
Pattern 1: Conversation Memory
class ConversationMemory:
def __init__(self):
self.short_term = [] # Last N messages
self.long_term = VectorStore() # Semantic search
self.entities = EntityStore() # Mentioned entities
def add_message(self, message: str, role: str):
# Short-term: sliding window
self.short_term.append({"role": role, "content": message})
if len(self.short_term) > 20:
self.short_term.pop(0)
# Long-term: embed and store
self.long_term.add(message, metadata={"role": role})
# Entity extraction
entities = extract_entities(message)
self.entities.update(entities)
def retrieve(self, query: str, k: int = 5) -> list:
# Combine short-term + relevant long-term
recent = self.short_term[-5:]
similar = self.long_term.search(query, k=k)
entities = self.entities.get_relevant(query)
return {
"recent": recent,
"similar": similar,
"entities": entities
}
Pattern 2: Entity Memory
class EntityMemory:
def __init__(self):
self.entities = {} # entity_name -> EntityRecord
self.relationships = [] # (entity1, relation, entity2)
def update(self, entity: str, info: dict):
if entity not in self.entities:
self.entities[entity] = EntityRecord(entity)
self.entities[entity].update(info)
self.entities[entity].last_mentioned = now()
def get_context(self, entity: str) -> str:
if entity not in self.entities:
return ""
record = self.entities[entity]
related = self.get_relationships(entity)
return f"""
Entity: {entity}
Type: {record.type}
Properties: {record.properties}
Related: {related}
Last mentioned: {record.last_mentioned}
"""
Pattern 3: Tiered Memory
class TieredMemory:
def __init__(self):
self.hot = LRUCache(100) # Frequent access
self.warm = VectorStore() # Semantic search
self.cold = PersistentStore() # Rarely accessed
def get(self, key: str):
# Check hot first
if key in self.hot:
return self.hot[key]
# Then warm
result = self.warm.get(key)
if result:
self.hot[key] = result # Promote
return result
# Finally cold
result = self.cold.get(key)
if result:
self.warm.add(key, result) # Promote
return result
return None
Use This Skill When
- Designing persistent memory for AI agents
- Implementing RAG systems
- Building knowledge management systems
- Choosing between vector vs graph approaches
- Optimizing memory retrieval performance
- Integrating with Grey Haven knowledge base
Related Skills
context-management- Managing context in workflowsdata-modeling- Designing memory data structuresllm-project-development- Building LLM applications
Quick Start
# Understand architecture options
cat reference/architecture-patterns.md
# See implementation examples
cat examples/conversation-memory.md
# Use selection checklist
cat checklists/memory-selection-checklist.md
Skill Version: 1.0 Key Benchmark: Zep 94.8% recall, 45ms latency Related Agents: 8 knowledge-base agents Last Updated: 2025-01-15
More from greyhaven-ai/claude-code-config
grey-haven-creative-writing
Professional writing assistance for blogs, research articles, fiction, essays, and marketing copy. Use when users want to write, edit, or improve any form of written content. Triggers: 'write a blog', 'write an article', 'help me write', 'write a story', 'write a chapter', 'draft an essay', 'creative writing', 'improve my writing', 'edit my writing', 'write copy', 'content writing'.
139creative-writing
Professional writing assistance for blogs, research articles, fiction, essays, and marketing copy. Use when users want to write, edit, or improve any form of written content. Triggers: 'write a blog', 'write an article', 'help me write', 'write a story', 'write a chapter', 'draft an essay', 'creative writing', 'improve my writing', 'edit my writing', 'write copy', 'content writing'.
35grey-haven-prompt-engineering
Master 26 documented prompt engineering principles for crafting effective LLM prompts with 400%+ quality improvement. Includes templates, anti-patterns, and quality checklists for technical, learning, creative, and research tasks. Use when writing prompts for LLMs, improving AI response quality, training on prompting, designing agent instructions, or when user mentions 'prompt engineering', 'better prompts', 'LLM quality', 'prompt templates', 'AI prompts', 'prompt principles', or 'prompt optimization'.
12grey-haven-ontological-documentation
Create comprehensive ontological documentation for Grey Haven systems - extract domain concepts from TanStack Start and FastAPI codebases, model semantic relationships, generate visual representations of system architecture, and document business domains. Use when onboarding, documenting architecture, or analyzing legacy systems.
12grey-haven-api-design
Design RESTful APIs following Grey Haven standards - FastAPI routes, Pydantic schemas, HTTP status codes, pagination, filtering, error responses, OpenAPI docs, and multi-tenant patterns. Use when creating API endpoints, designing REST resources, implementing server functions, configuring FastAPI, writing Pydantic schemas, setting up error handling, implementing pagination, or when user mentions 'API', 'endpoint', 'REST', 'FastAPI', 'Pydantic', 'server function', 'OpenAPI', 'pagination', 'validation', 'error handling', 'rate limiting', 'CORS', or 'authentication'.
12grey-haven-llm-project-development
Build LLM-powered applications and pipelines using proven methodology - task-model fit analysis, pipeline architecture, structured outputs, file-based state, and cost estimation. Use when building AI features, data processing pipelines, agents, or any LLM-integrated system. Inspired by Karpathy's methodology and production case studies.
11