memory-design-patterns
Memory Design Patterns
Production-ready memory architecture patterns for AI applications using Mem0. This skill provides comprehensive guidance on designing scalable, performant memory systems with proper isolation, retention strategies, and optimization techniques.
Instructions
Phase 1: Understand Memory Types
Mem0 provides three distinct memory scopes, each serving different purposes:
1. User Memory (Persistent Preferences & Profile)
Purpose: Long-term personal preferences, profile data, and user characteristics that persist across all interactions.
Use Cases:
- User preferences (dietary restrictions, communication style, language preferences)
- Personal information (location, occupation, family details)
- Long-term goals and interests
- Historical context that should persist indefinitely
Implementation:
# Add user-level memory
memory.add(
"User prefers concise responses without technical jargon"
user_id="customer_bob"
)
# Search user memories
user_context = memory.search(
"communication style"
user_id="customer_bob"
)
Key Characteristics:
- Persists indefinitely (or until explicitly deleted)
- Shared across all agents interacting with this user
- Should contain stable, long-term information
- Typically 10-50 memories per user
2. Agent Memory (Agent-Specific Context)
Purpose: Agent-specific knowledge, behaviors, and learned patterns that apply across all users interacting with this agent.
Use Cases:
- Agent capabilities and limitations
- Domain-specific knowledge
- Learned behaviors and patterns
- Agent-specific instructions and protocols
Implementation:
# Add agent-level memory
memory.add(
"When handling refund requests, always check order date first"
agent_id="support_agent_v2"
)
# Search agent memories
agent_context = memory.search(
"refund process"
agent_id="support_agent_v2"
)
Key Characteristics:
- Shared across all users interacting with this agent
- Contains agent-specific procedures and knowledge
- Moderate retention (days to months)
- Typically 50-200 memories per agent
3. Session/Run Memory (Temporary Conversation Context)
Purpose: Ephemeral context specific to a single conversation or task session.
Use Cases:
- Current conversation topic
- Temporary task context
- Session-specific state
- Short-term working memory
Implementation:
# Add session-level memory
memory.add(
"Current issue: payment failed with error code 402"
run_id="session_12345_20250115"
)
# Search session memories
session_context = memory.search(
"current issue"
run_id="session_12345_20250115"
)
Key Characteristics:
- Short-lived (minutes to hours)
- Isolated to specific conversation or task
- Should be cleaned up after session ends
- Typically 5-20 memories per session
Phase 2: Choose Storage Backend (Vector vs Graph)
Vector Memory (Default)
How It Works: Embeddings stored in vector database, semantic similarity search using cosine distance.
Strengths:
- Fast semantic search
- Excellent for unstructured data
- Low setup complexity
- Works out-of-the-box with Mem0
Weaknesses:
- Cannot query relationships
- No explicit entity connections
- Limited reasoning about connections
Best For:
- Simple preference storage
- Document/chunk retrieval
- Semantic search use cases
- Quick prototyping
Configuration:
from mem0 import Memory
# Default vector-only configuration
memory = Memory()
Graph Memory (Advanced)
How It Works: Entities and relationships stored in graph database (Neo4j/Memgraph), enables relationship traversal and complex queries.
Strengths:
- Explicit entity relationships
- Complex query capabilities
- Relationship reasoning
- Multi-hop traversal
Weaknesses:
- Requires graph database setup
- Higher infrastructure complexity
- Slower for pure semantic search
- More storage overhead
Best For:
- Multi-entity systems
- Relationship-heavy domains
- Complex reasoning requirements
- Enterprise knowledge graphs
Configuration:
from mem0 import Memory
from mem0.configs.base import MemoryConfig
config = MemoryConfig(
graph_store={
"provider": "neo4j"
"config": {
"url": "bolt://localhost:7687"
"username": "neo4j"
"password": "password"
}
}
)
memory = Memory(config)
Decision Matrix:
| Use Case | Vector | Graph |
|---|---|---|
| User preferences | ✅ Best | ⚠️ Overkill |
| Product recommendations | ✅ Best | ⚠️ Overkill |
| Customer support | ✅ Good | ✅ Better |
| Knowledge management | ⚠️ Limited | ✅ Best |
| Multi-tenant systems | ✅ Good | ✅ Best |
| Team collaboration | ⚠️ Limited | ✅ Best |
Phase 3: Design Retention Strategy
Use the retention strategy template:
bash scripts/generate-retention-policy.sh <memory-type> <retention-days>
Retention Guidelines
User Memory:
- Retention: Indefinite (with user control)
- Cleanup: User-initiated deletion only
- Archival: After 1 year of inactivity
- GDPR: Must support right to deletion
Agent Memory:
- Retention: 90-180 days typical
- Cleanup: Automatic based on relevance score
- Versioning: Keep agent version history
- Deprecation: Clear old agent memories on major updates
Session Memory:
- Retention: 1-24 hours
- Cleanup: Automatic after session end
- Conversion: Promote important memories to user/agent level
- Storage: Consider in-memory for very short sessions
Retention Implementation
Run the retention analyzer:
bash scripts/analyze-retention.sh <user_id_or_agent_id>
This script:
- Analyzes memory age and access patterns
- Identifies stale memories
- Suggests cleanup actions
- Generates retention reports
Phase 4: Implement Multi-Level Memory Pattern
Pattern: Combine all three memory types for comprehensive context.
Template: Use templates/multi-level-memory-pattern.py
Architecture:
Query Processing Flow:
1. Retrieve session context (immediate)
2. Retrieve user context (preferences)
3. Retrieve agent context (capabilities)
4. Merge contexts with priority weighting
5. Generate response with full context
Priority Weighting:
- Session: 40% weight (most relevant to current task)
- User: 35% weight (personalizes response)
- Agent: 25% weight (ensures consistent behavior)
Implementation:
# Retrieve all context levels
session_memories = memory.search(query, run_id=run_id)
user_memories = memory.search(query, user_id=user_id)
agent_memories = memory.search(query, agent_id=agent_id)
# Weighted merge
context = merge_contexts(
session=session_memories
user=user_memories
agent=agent_memories
weights={"session": 0.4, "user": 0.35, "agent": 0.25}
)
Phase 5: Optimize Performance
Vector Search Optimization
Run the performance analyzer:
bash scripts/analyze-memory-performance.sh <project_name>
Optimization Techniques:
-
Limit Search Results:
memories = memory.search(query, user_id=user_id, limit=5)- Default: 10 results
- Recommended: 3-5 for chat, 10-20 for RAG
-
Use Filters to Reduce Search Space:
memories = memory.search( query filters={ "AND": [ {"user_id": "alex"} {"agent_id": "support_agent"} ] } ) -
Cache Frequently Accessed Memories:
- Cache user preferences (rarely change)
- Refresh cache every 5-10 minutes
- Invalidate on explicit memory updates
-
Batch Operations:
# Add multiple memories in one call memory.add(messages, user_id=user_id)
Graph Query Optimization
For graph memory:
- Limit Traversal Depth: Max 2-3 hops
- Index Key Properties: user_id, agent_id, timestamps
- Use Relationship Filters: Reduce unnecessary traversals
- Monitor Query Performance: Track slow queries > 100ms
Phase 6: Implement Cost Optimization
Run the cost analyzer:
bash scripts/analyze-memory-costs.sh <user_id> <date_range>
Cost Optimization Strategies:
-
Deduplication: Remove similar/redundant memories
bash scripts/deduplicate-memories.sh <user_id> -
Archival: Move old memories to cold storage
- Active: Last 30 days (vector DB)
- Archive: 30-180 days (compressed JSON)
- Long-term: > 180 days (S3/cold storage)
-
Compression: Use shorter embeddings for less critical memories
- Critical: 1536 dimensions (OpenAI large)
- Standard: 768 dimensions (OpenAI small)
- Archival: 384 dimensions (lightweight model)
-
Smart Pruning: Remove low-value memories
- Score-based: Keep only high relevance scores
- Access-based: Remove never-accessed memories
- Importance-based: User/agent priority tagging
Phase 7: Security and Isolation
Multi-Tenant Isolation
Pattern: Ensure complete data isolation between users/organizations.
Implementation:
# Always scope by user_id or org_id
memories = memory.search(
query
filters={"user_id": current_user_id}
)
# Validate access before retrieval
if not user_has_access(user_id, requested_user_id):
raise PermissionError("Access denied")
Security Checklist:
- ✅ Never allow cross-user memory access
- ✅ Validate all user_id parameters
- ✅ Implement org-level isolation for multi-tenant apps
- ✅ Audit memory access logs
- ✅ Encrypt sensitive memory content
- ✅ Support GDPR right to deletion
Run the security audit:
bash scripts/audit-memory-security.sh
Decision Trees
When to Use Each Memory Type
Use the decision helper:
bash scripts/suggest-memory-type.sh "<use_case_description>"
Quick Reference:
- User dietary preferences → User Memory
- Agent's SOP for task X → Agent Memory
- Current conversation topic → Session Memory
- Customer support ticket details → Session Memory (promote to User if resolved)
- System capabilities → Agent Memory
- User's birthday → User Memory
Vector vs Graph Decision
Use the architecture advisor:
bash scripts/suggest-storage-architecture.sh "<project_description>"
Decision Criteria:
- Need relationship traversal? → Graph
- Pure semantic search? → Vector
- < 10,000 memories total? → Vector
- Complex entity relationships? → Graph
- Team/org hierarchies? → Graph
- Simple preference storage? → Vector
Key Files
Scripts (all functional, not placeholders):
scripts/generate-retention-policy.sh- Create retention policy configsscripts/analyze-retention.sh- Analyze memory age and access patternsscripts/analyze-memory-performance.sh- Performance profilingscripts/analyze-memory-costs.sh- Cost analysis and optimization suggestionsscripts/deduplicate-memories.sh- Find and remove duplicate memoriesscripts/audit-memory-security.sh- Security compliance checkingscripts/suggest-memory-type.sh- Interactive memory type advisorscripts/suggest-storage-architecture.sh- Architecture recommendation tool
Templates:
templates/multi-level-memory-pattern.py- Complete implementationtemplates/retention-policy.yaml- Retention configurationtemplates/vector-only-config.py- Vector memory setuptemplates/graph-memory-config.py- Graph memory setuptemplates/hybrid-architecture.py- Vector + Graph combinedtemplates/cost-optimization-config.yaml- Cost optimization settings
Examples:
examples/customer-support-memory-architecture.md- Full implementation guideexamples/multi-agent-collaboration.md- Shared memory patternsexamples/e-commerce-personalization.md- Product recommendation memoryexamples/healthcare-assistant.md- HIPAA-compliant memory architecture
Best Practices
- Start Simple: Use vector-only with user + session memories
- Add Complexity as Needed: Only introduce graph when relationships matter
- Monitor Performance: Track memory retrieval times and costs
- Implement Retention Early: Don't let memory grow unbounded
- Test Isolation: Verify cross-user memory access is impossible
- Document Memory Schema: Track what memories mean and when they're used
- Version Agent Memories: Clear separation between agent versions
- Promote Important Memories: Session → User when patterns emerge
- Use Metadata: Tag memories with categories for better filtering
- Regular Audits: Monthly review of memory growth and costs
Troubleshooting
Slow Memory Retrieval:
- Reduce search limit
- Add more specific filters
- Check vector index performance
- Consider caching
High Costs:
- Run cost analyzer script
- Implement deduplication
- Review retention policy
- Archive old memories
Poor Search Results:
- Check embedding model quality
- Verify memory content is descriptive
- Use hybrid search (keyword + semantic)
- Add metadata for filtering
Memory Leakage Between Users:
- Audit security script immediately
- Review all memory queries for user_id filtering
- Check RLS policies if using custom backends
- Implement access logging
Plugin: mem0 Version: 1.0.0 Last Updated: 2025-10-27