NYC
skills/letta-ai/skills/letta-development-guide

letta-development-guide

SKILL.md

Letta Development Guide

Comprehensive guide for designing and building effective Letta agents with appropriate architectures, memory configurations, model selection, and tool setups.

When to Use This Skill

Use this skill when:

  • Starting a new Letta agent project
  • Choosing between agent architectures (letta_v1_agent vs memgpt_v2_agent)
  • Designing memory block structure and architecture
  • Selecting appropriate models for your use case
  • Planning tool configurations
  • Optimizing memory management and performance
  • Implementing shared memory between agents
  • Debugging memory-related issues

Quick Start Guide

Minimal Working Example

from letta_client import Letta

client = Letta()
agent = client.agents.create(
    name="my-assistant",
    model="openai/gpt-4o",
    embedding="openai/text-embedding-3-small",
    memory_blocks=[
        {"label": "persona", "value": "You are a helpful assistant."},
        {"label": "human", "value": "The user's name and preferences."},
    ],
)

# Send a message
response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.messages[-1].content)

1. Architecture Selection

Use letta_v1_agent when:

  • Building new agents (recommended default)
  • Need compatibility with reasoning models (GPT-4o, Claude Sonnet 4)
  • Want simpler system prompts and direct message generation

Use memgpt_v2_agent when:

  • Maintaining legacy agents
  • Require specific tool patterns not yet supported in v1

For detailed comparison, see references/architectures.md.

2. Memory Architecture Design

Memory is the foundation of effective agents. Letta provides three memory types:

Core Memory (in-context):

  • Always accessible in agent's context window
  • Use for: current state, active context, frequently referenced information
  • Limit: Keep total core memory under 80% of context window

Archival Memory (out-of-context):

  • Semantic search over vector database
  • Use for: historical records, large knowledge bases, past interactions
  • Access: Agent must explicitly call archival_memory_search
  • Note: NOT automatically populated from context overflow

Conversation History:

  • Past messages from current conversation
  • Retrieved via conversation_search tool
  • Use for: referencing earlier discussion, tracking conversation flow

See references/memory-architecture.md for detailed guidance.

3. Memory Block Design

Core principle: One block per distinct functional unit.

Essential blocks:

  • persona: Agent identity, behavioral guidelines, capabilities
  • human: User information, preferences, context

Add domain-specific blocks based on use case:

  • Customer support: company_policies, product_knowledge, customer
  • Coding assistant: project_context, coding_standards, current_task
  • Personal assistant: schedule, preferences, contacts

Memory block guidelines:

  • Keep blocks focused and purpose-specific
  • Use clear, instructional descriptions
  • Monitor size limits (typically 2000-5000 characters per block)
  • Design for append operations when sharing memory between agents

See references/memory-patterns.md for domain examples and references/description-patterns.md for writing effective descriptions.

4. Model Selection

Match model capabilities to agent requirements:

For production agents:

  • GPT-4o or Claude Sonnet 4 for complex reasoning
  • GPT-4o-mini for cost-efficient general tasks
  • Claude Haiku 3.5 for fast, lightweight operations
  • Gemini 2.0 Flash for balanced speed/capability

Avoid for production:

  • Small Ollama models (<7B parameters) - poor tool calling
  • Models without reliable function calling support

See references/model-recommendations.md for detailed guidance.

5. Tool Configuration

Start minimal: Attach only tools the agent will actively use.

Common starting points:

  • Memory tools (memory_insert, memory_replace, memory_rethink): Core for most agents
  • File system tools: Auto-attached when folders are connected
  • Custom tools: For domain-specific operations (databases, APIs, etc.)

Tool Rules: Use to enforce sequencing when needed (e.g., "always call search before answer")

Consult references/tool-patterns.md for common configurations.

Advanced Topics

Memory Size Management

When approaching character limits:

  1. Split by topic: customer_profilecustomer_business, customer_preferences
  2. Split by time: interaction_historyrecent_interactions, archive older to archival memory
  3. Archive historical data: Move old information to archival memory
  4. Consolidate with memory_rethink: Summarize and rewrite block

See references/size-management.md for strategies.

Concurrency Patterns

When multiple agents share memory blocks or an agent processes concurrent requests:

Safest operations:

  • memory_insert: Append-only, minimal race conditions
  • Database uses PostgreSQL row-level locking

Risk of race conditions:

  • memory_replace: Target string may change before write
  • memory_rethink: Last-writer-wins, no merge

Best practices:

  • Design for append operations when possible
  • Use memory_insert for concurrent writes
  • Reserve memory_rethink for single-agent exclusive access

Consult references/concurrency.md for detailed patterns.

Validation Checklist

Before finalizing your agent design:

Architecture:

  • Does the architecture match the model's capabilities?
  • Is the model appropriate for expected workload and latency requirements?

Memory:

  • Is core memory total under 80% of context window?
  • Is each block focused on one functional area?
  • Are descriptions clear about when to read/write?
  • Have you planned for size growth and overflow?
  • If multi-agent, are concurrency patterns considered?

Tools:

  • Are tools necessary and properly configured?
  • Are memory blocks granular enough for effective updates?

Common Antipatterns

Too few memory blocks:

# Bad: Everything in one block
agent_memory: "Agent is helpful. User is John..."

Split into focused blocks instead.

Too many memory blocks: Creating 10+ blocks when 3-4 would suffice. Start minimal, expand as needed.

Poor descriptions:

# Bad
data: "Contains data"

Provide actionable guidance instead. See references/description-patterns.md.

Ignoring size limits: Letting blocks grow indefinitely until they hit limits. Monitor and manage proactively.

Implementation Steps

1. Design Phase

  • Choose architecture based on requirements
  • Design memory block structure
  • Select appropriate model
  • Plan tool configuration

2. Creation Phase (SDK)

Python:

from letta_client import Letta

client = Letta()  # Uses LETTA_API_KEY env var

# Create agent with custom memory blocks
agent = client.agents.create(
    name="my-agent",
    model="openai/gpt-4o",  # or "anthropic/claude-sonnet-4-20250514"
    embedding="openai/text-embedding-3-small",
    memory_blocks=[
        {"label": "persona", "value": "You are a helpful assistant..."},
        {"label": "human", "value": "User preferences and context..."},
        {"label": "project", "value": "Current project details..."},
    ],
    description="Agent for helping with X",
)
print(f"Created agent: {agent.id}")

TypeScript:

import Letta from "letta-client";

const client = new Letta();

const agent = await client.agents.create({
  name: "my-agent",
  model: "openai/gpt-4o",
  embedding: "openai/text-embedding-3-small",
  memoryBlocks: [
    { label: "persona", value: "You are a helpful assistant..." },
    { label: "human", value: "User preferences and context..." },
    { label: "project", value: "Current project details..." },
  ],
  description: "Agent for helping with X",
});
console.log(`Created agent: ${agent.id}`);

Note: Letta Code CLI (letta command) creates agents interactively. Use letta --new-agent to start fresh, then /rename and /description to configure.

3. Testing Phase

  • Test with representative queries
  • Monitor memory tool usage patterns
  • Verify tool calling behavior

4. Iteration Phase

  • Refine memory block structure based on actual usage
  • Optimize system instructions
  • Adjust tool configurations

References

For detailed information on specific topics, consult the reference materials:

  • references/architectures.md - Architecture comparison and selection
  • references/memory-architecture.md - Memory types and when to use them
  • references/memory-patterns.md - Domain-specific memory block examples
  • references/description-patterns.md - Writing effective block descriptions
  • references/size-management.md - Managing memory block size limits
  • references/concurrency.md - Multi-agent memory sharing patterns
  • references/model-recommendations.md - Model selection guidance
  • references/tool-patterns.md - Common tool configurations
Weekly Installs
23
Repository
letta-ai/skills
First Seen
Jan 19, 2026
Installed on
opencode15
claude-code15
codex14
gemini-cli13
antigravity12
github-copilot10