letta-development-guide
Letta Development Guide
Comprehensive guide for designing and building effective Letta agents with appropriate architectures, memory configurations, model selection, and tool setups.
When to Use This Skill
Use this skill when:
- Starting a new Letta agent project
- Choosing between agent architectures (letta_v1_agent vs memgpt_v2_agent)
- Designing memory block structure and architecture
- Selecting appropriate models for your use case
- Planning tool configurations
- Optimizing memory management and performance
- Implementing shared memory between agents
- Debugging memory-related issues
Quick Start Guide
Minimal Working Example
from letta_client import Letta
client = Letta()
agent = client.agents.create(
name="my-assistant",
model="openai/gpt-4o",
embedding="openai/text-embedding-3-small",
memory_blocks=[
{"label": "persona", "value": "You are a helpful assistant."},
{"label": "human", "value": "The user's name and preferences."},
],
)
# Send a message
response = client.agents.messages.create(
agent_id=agent.id,
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.messages[-1].content)
1. Architecture Selection
Use letta_v1_agent when:
- Building new agents (recommended default)
- Need compatibility with reasoning models (GPT-4o, Claude Sonnet 4)
- Want simpler system prompts and direct message generation
Use memgpt_v2_agent when:
- Maintaining legacy agents
- Require specific tool patterns not yet supported in v1
For detailed comparison, see references/architectures.md.
2. Memory Architecture Design
Memory is the foundation of effective agents. Letta provides three memory types:
Core Memory (in-context):
- Always accessible in agent's context window
- Use for: current state, active context, frequently referenced information
- Limit: Keep total core memory under 80% of context window
Archival Memory (out-of-context):
- Semantic search over vector database
- Use for: historical records, large knowledge bases, past interactions
- Access: Agent must explicitly call archival_memory_search
- Note: NOT automatically populated from context overflow
Conversation History:
- Past messages from current conversation
- Retrieved via conversation_search tool
- Use for: referencing earlier discussion, tracking conversation flow
See references/memory-architecture.md for detailed guidance.
3. Memory Block Design
Core principle: One block per distinct functional unit.
Essential blocks:
persona: Agent identity, behavioral guidelines, capabilitieshuman: User information, preferences, context
Add domain-specific blocks based on use case:
- Customer support:
company_policies,product_knowledge,customer - Coding assistant:
project_context,coding_standards,current_task - Personal assistant:
schedule,preferences,contacts
Memory block guidelines:
- Keep blocks focused and purpose-specific
- Use clear, instructional descriptions
- Monitor size limits (typically 2000-5000 characters per block)
- Design for append operations when sharing memory between agents
See references/memory-patterns.md for domain examples and references/description-patterns.md for writing effective descriptions.
4. Model Selection
Match model capabilities to agent requirements:
For production agents:
- GPT-4o or Claude Sonnet 4 for complex reasoning
- GPT-4o-mini for cost-efficient general tasks
- Claude Haiku 3.5 for fast, lightweight operations
- Gemini 2.0 Flash for balanced speed/capability
Avoid for production:
- Small Ollama models (<7B parameters) - poor tool calling
- Models without reliable function calling support
See references/model-recommendations.md for detailed guidance.
5. Tool Configuration
Start minimal: Attach only tools the agent will actively use.
Common starting points:
- Memory tools (memory_insert, memory_replace, memory_rethink): Core for most agents
- File system tools: Auto-attached when folders are connected
- Custom tools: For domain-specific operations (databases, APIs, etc.)
Tool Rules: Use to enforce sequencing when needed (e.g., "always call search before answer")
Consult references/tool-patterns.md for common configurations.
Advanced Topics
Memory Size Management
When approaching character limits:
- Split by topic:
customer_profile→customer_business,customer_preferences - Split by time:
interaction_history→recent_interactions, archive older to archival memory - Archive historical data: Move old information to archival memory
- Consolidate with memory_rethink: Summarize and rewrite block
See references/size-management.md for strategies.
Concurrency Patterns
When multiple agents share memory blocks or an agent processes concurrent requests:
Safest operations:
memory_insert: Append-only, minimal race conditions- Database uses PostgreSQL row-level locking
Risk of race conditions:
memory_replace: Target string may change before writememory_rethink: Last-writer-wins, no merge
Best practices:
- Design for append operations when possible
- Use memory_insert for concurrent writes
- Reserve memory_rethink for single-agent exclusive access
Consult references/concurrency.md for detailed patterns.
Validation Checklist
Before finalizing your agent design:
Architecture:
- Does the architecture match the model's capabilities?
- Is the model appropriate for expected workload and latency requirements?
Memory:
- Is core memory total under 80% of context window?
- Is each block focused on one functional area?
- Are descriptions clear about when to read/write?
- Have you planned for size growth and overflow?
- If multi-agent, are concurrency patterns considered?
Tools:
- Are tools necessary and properly configured?
- Are memory blocks granular enough for effective updates?
Common Antipatterns
Too few memory blocks:
# Bad: Everything in one block
agent_memory: "Agent is helpful. User is John..."
Split into focused blocks instead.
Too many memory blocks: Creating 10+ blocks when 3-4 would suffice. Start minimal, expand as needed.
Poor descriptions:
# Bad
data: "Contains data"
Provide actionable guidance instead. See references/description-patterns.md.
Ignoring size limits: Letting blocks grow indefinitely until they hit limits. Monitor and manage proactively.
Implementation Steps
1. Design Phase
- Choose architecture based on requirements
- Design memory block structure
- Select appropriate model
- Plan tool configuration
2. Creation Phase (SDK)
Python:
from letta_client import Letta
client = Letta() # Uses LETTA_API_KEY env var
# Create agent with custom memory blocks
agent = client.agents.create(
name="my-agent",
model="openai/gpt-4o", # or "anthropic/claude-sonnet-4-20250514"
embedding="openai/text-embedding-3-small",
memory_blocks=[
{"label": "persona", "value": "You are a helpful assistant..."},
{"label": "human", "value": "User preferences and context..."},
{"label": "project", "value": "Current project details..."},
],
description="Agent for helping with X",
)
print(f"Created agent: {agent.id}")
TypeScript:
import Letta from "letta-client";
const client = new Letta();
const agent = await client.agents.create({
name: "my-agent",
model: "openai/gpt-4o",
embedding: "openai/text-embedding-3-small",
memoryBlocks: [
{ label: "persona", value: "You are a helpful assistant..." },
{ label: "human", value: "User preferences and context..." },
{ label: "project", value: "Current project details..." },
],
description: "Agent for helping with X",
});
console.log(`Created agent: ${agent.id}`);
Note: Letta Code CLI (letta command) creates agents interactively. Use letta --new-agent to start fresh, then /rename and /description to configure.
3. Testing Phase
- Test with representative queries
- Monitor memory tool usage patterns
- Verify tool calling behavior
4. Iteration Phase
- Refine memory block structure based on actual usage
- Optimize system instructions
- Adjust tool configurations
References
For detailed information on specific topics, consult the reference materials:
references/architectures.md- Architecture comparison and selectionreferences/memory-architecture.md- Memory types and when to use themreferences/memory-patterns.md- Domain-specific memory block examplesreferences/description-patterns.md- Writing effective block descriptionsreferences/size-management.md- Managing memory block size limitsreferences/concurrency.md- Multi-agent memory sharing patternsreferences/model-recommendations.md- Model selection guidancereferences/tool-patterns.md- Common tool configurations