memory-bank
Memory Bank
An adaptive memory system that gives Claude Code persistent, intelligent context across sessions — while cutting token waste so your sessions last 3-5x longer. Not a flat file — a layered architecture that compresses, branches, diffs, self-heals, and loads only what matters.
Core Architecture
Memory Bank operates on three layers:
┌─────────────────────────────────────────────┐
│ Layer 2: GLOBAL MEMORY │
│ ~/.claude/GLOBAL-MEMORY.md │
│ Cross-project patterns, user preferences, │
│ reusable decisions. Permanent. │
├─────────────────────────────────────────────┤
│ Layer 1: PROJECT MEMORY │
│ ./MEMORY.md (+ branch overlays) │
│ Architecture, decisions, active work. │
│ Lives as long as the project. │
├─────────────────────────────────────────────┤
│ Layer 0: SESSION CONTEXT │
│ In-conversation only. │
│ Current task focus, scratch notes. │
│ Dies when session ends (persisted to L1). │
└─────────────────────────────────────────────┘
Layer 0 (Session) — Ephemeral. Tracks what you're doing right now. Automatically flushed to Layer 1 at session end.
Layer 1 (Project) — The primary memory file. Tracks project state, decisions, active work, blockers. Branch-aware: each git branch can have its own overlay that merges with the base memory.
Layer 2 (Global) — Cross-project knowledge. Your coding preferences,
tool choices, patterns you always use. Lives in ~/.claude/GLOBAL-MEMORY.md.
Loaded alongside Layer 1 at session start.
See
references/memory-layers.mdfor full architecture details.
When to Activate
| Trigger | Action |
|---|---|
Session starts, MEMORY.md exists |
Full load sequence |
"remember this", "don't forget" |
Mid-session update |
"wrap up", "save progress", "done for now" |
Full session write |
"pick up where we left off", "what were we doing" |
Load + summarize |
"switch to [branch]", "context for [feature]" |
Branch-aware load |
"memory health", "is memory stale" |
Health check |
"hand off", "onboard someone" |
Generate handoff doc |
"compress memory", "clean up memory" |
Run compression |
"rebuild memory" |
Recovery mode |
"save state", "continue this later" |
Session continuation protocol |
"context budget", "how much context left" |
Context budget check |
"running out of context", "session is long" |
Emergency save + continuation file |
Workflow
1. Session Start — The Load Sequence
Execute this sequence before doing anything else:
Step 1: Detect memory files
└─ Check for MEMORY.md in project root
└─ Check for ~/.claude/GLOBAL-MEMORY.md
└─ Check for MEMORY-ARCHIVE.md (has history been archived?)
Step 2: Detect git context
└─ Current branch name
└─ Check for .memory/branches/<branch>.md overlay
└─ Days since last session (from "Last updated" field)
Step 3: Session diff (if git available)
└─ Commits since last memory update
└─ Files changed since last session
└─ Any conflicts between memory and current code state
Step 4: Health check
└─ Score memory freshness (see Health Scoring below)
└─ Flag stale entries
└─ Flag referenced files that no longer exist
Step 5: Context-aware greeting
└─ Summarize where we left off (2-3 sentences, specific)
└─ Report any drift detected (code changed, memory stale)
└─ State the next immediate action
└─ Ask: "Ready to continue, or has the plan changed?"
Example greeting (fresh memory, same branch):
"Welcome back! Last session you finished the Stripe webhook handler in
src/api/webhooks/stripe.tsand were about to write integration tests. ThehandlePaymentSuccess()function is complete buthandleRefund()is stubbed out. 3 commits have landed since — all yours, no surprises. Ready to pick up with the integration tests?"
Example greeting (stale memory, branch switched):
"Welcome back! Your memory is from 5 days ago on
main, but you're now onfeature/user-profiles. I found a branch overlay from 3 days ago with context about the profile avatar upload. However,src/components/Avatar.tsxreferenced in memory was renamed toProfileImage.tsx. Want me to update memory with the current state before we continue?"
If no MEMORY.md exists:
- Proceed normally
- After first meaningful work, offer: "Want me to start tracking our progress? I'll create a memory file so next session picks up instantly."
2. Mid-Session Updates
When the user says "remember this" or you complete a significant milestone:
- Read current
MEMORY.md - Determine what changed:
- New decision made? → Update
Key Decisions - Task completed? → Move from
Active WorktoCompleted, updateWhere We Left Off - New blocker? → Add to
Blockers - Important context? → Add to
Notes
- New decision made? → Update
- Write the updated file
- Confirm with specifics: "Saved — added the Zod migration decision and marked the user model as complete."
Do NOT rewrite the entire file on mid-session updates. Only modify the sections that changed. This preserves context from session start.
3. Session End — The Write Sequence
When wrapping up, execute a full memory write:
Step 1: Audit the session
└─ What was accomplished? (be specific: files, functions, lines)
└─ What decisions were made and why?
└─ What's blocked or unresolved?
└─ What should happen next? (crystal clear next step)
Step 2: Compress completed work
└─ Move finished items to Completed with one-line summaries
└─ Remove resolved blockers
└─ Archive stale notes
Step 3: Update memory health metadata
└─ Update "Last updated" timestamp
└─ Increment session counter
└─ Update file reference table (verify paths still exist)
Step 4: Write MEMORY.md
└─ Full overwrite with current state
└─ Verify the file was written successfully
Step 5: Check compression threshold
└─ If > 150 lines, suggest compression
└─ If > 200 lines, auto-compress (see Smart Compression)
Step 6: Prompt for global memory
└─ Any cross-project learnings worth saving to Layer 2?
└─ New user preferences discovered?
MEMORY.md Template
# Project Memory
Last updated: [DATE] | Session [N] | Branch: [BRANCH]
Memory health: [SCORE]/10
## Project Overview
[1-2 sentences. What this is, what stack, what stage.]
## Where We Left Off
- **Current task:** [specific task with file/function reference]
- **Status:** [done | in progress | blocked]
- **Next immediate step:** [so clear Claude can start without asking anything]
- **Open question:** [decision pending, if any]
## Completed
- [DATE] [one-line summary with key files touched]
- [DATE] [one-line summary]
## Active Work
- [ ] [task — specific file, function, or component]
- [ ] [task]
- [x] [recently completed, will archive on next compression]
## Blockers
- [blocker with context on what's needed to unblock]
## Key Decisions
| Date | Decision | Reasoning | Affects |
|------|----------|-----------|---------|
| [DATE] | [what was decided] | [why] | [files/areas impacted] |
## Key Files
| File | Purpose | Last Modified |
|------|---------|---------------|
| [path] | [what it does] | [session N] |
## Architecture Notes
[Non-obvious design choices, data flow, system boundaries]
## Known Issues
- [issue, severity, and workaround if any]
## Session Log
| Session | Date | Summary |
|---------|------|---------|
| [N] | [DATE] | [one-line summary of what happened] |
## User Preferences
[How the user likes to work — discovered across sessions]
## External Context
[APIs, services, env setup — NO secrets, NO credentials, NEVER]
Branch-Aware Memory
When working across multiple git branches, memory adapts:
MEMORY.md <- Base project memory (main/trunk)
.memory/
branches/
feature-auth.md <- Overlay for feature/auth branch
feature-payments.md <- Overlay for feature/payments branch
bugfix-race-condition.md <- Overlay for bugfix branch
How it works:
- At session start, detect current git branch
- Load base
MEMORY.mdfirst - Check
.memory/branches/<branch-slug>.mdfor an overlay - Merge overlay on top of base (overlay sections take priority)
- At session end, write changes back to the correct layer:
- Architecture decisions → base
MEMORY.md(shared across branches) - Branch-specific work →
.memory/branches/<branch>.md
- Architecture decisions → base
On branch merge:
- When a feature branch merges to main, prompt:
"The
feature/authbranch just merged. Want me to fold its memory overlay into the base MEMORY.md and clean up the branch file?"
See
references/branch-aware-memory.mdfor merge strategies.
Smart Compression
Memory files grow. Smart Compression keeps them useful:
Auto-compress triggers:
- MEMORY.md exceeds 150 lines → suggest compression
- MEMORY.md exceeds 200 lines → auto-compress
- Entries older than 5 sessions → candidates for archival
Compression rules:
- Completed tasks older than 3 sessions → collapse to one-liner in Session Log
- Resolved blockers → remove entirely
- Stale "Active Work" items (no progress in 3+ sessions) → flag for user
- Decision Log entries → NEVER compress (permanent record)
- Architecture Notes → NEVER compress (permanent record)
Archival:
When session count exceeds 10, create MEMORY-ARCHIVE.md:
# Memory Archive
Archived sessions from Project Memory.
## Sessions 1-8 Summary
[Paragraph summary of early project work]
## Key Milestones
- Session 2: Initial project scaffolding complete
- Session 5: Auth system shipped
- Session 8: Database migration to Prisma complete
See
references/smart-compression.mdfor the full compression algorithm.
Session Diffing
At session start, detect what changed since memory was last written:
# Get the date from MEMORY.md "Last updated" field
# Then check what happened since
git log --oneline --since="[last-updated-date]"
git diff --stat HEAD~[commits-since]
Report format:
"Since your last session (3 days ago), there have been 7 commits: 4 by you, 3 by @teammate. Key changes:
src/api/users.tswas refactored,package.jsonhas 2 new dependencies (zod, @tanstack/query). Your memory referencessrc/api/users.ts— I'll verify it's still accurate."
Conflict detection: When session diff reveals changes that contradict memory:
- Memory says "using Express" but
package.jsonnow has Fastify → flag - Memory references
src/auth/login.tsbut file was deleted → flag - Memory says "blocked on API key" but
.envnow has it → update
See
references/session-diffing.mdfor conflict resolution strategies.
Memory Health Scoring
Rate memory on a 1-10 scale across four dimensions:
| Dimension | Weight | Score 10 | Score 1 |
|---|---|---|---|
| Freshness | 30% | Updated today | > 14 days old |
| Relevance | 30% | All referenced files exist | Most files missing/renamed |
| Completeness | 20% | All sections filled, next step clear | Missing key sections |
| Actionability | 20% | Can start working immediately | Need to ask 3+ questions |
Display at session start:
Memory health: 8/10
Freshness: 9/10 (updated yesterday)
Relevance: 7/10 (2 file paths changed)
Completeness: 8/10 (all sections present)
Actionability: 9/10 (next step is crystal clear)
If health < 5: Trigger recovery mode or suggest a memory rebuild.
Recovery Mode
When memory is severely stale, corrupted, or missing critical context:
Step 1: Scan the project
└─ Read package.json / pyproject.toml / go.mod (detect stack)
└─ Read README.md and CLAUDE.md (project context)
└─ List key directories and recent files
Step 2: Read git history
└─ Last 20 commits (who, what, when)
└─ Current branch and recent branches
└─ Any open/recent PRs
Step 3: Reconstruct memory
└─ Build Project Overview from package.json + README
└─ Build Key Files from most-modified files in git log
└─ Build Key Decisions from commit messages and code patterns
└─ Set "Where We Left Off" from most recent commits
└─ Flag confidence level: "Reconstructed from code — verify with user"
Step 4: Present and confirm
└─ Show reconstructed memory to user
└─ Ask for corrections
└─ Write verified MEMORY.md
Handoff Protocol
Generate a developer handoff document that's optimized for humans (not Claude):
# Project Handoff: [Project Name]
Generated: [DATE] | By: [user] via Claude Code
## Quick Start
1. Clone: `git clone [repo]`
2. Install: `[package manager] install`
3. Setup: [env vars, database, etc.]
4. Run: `[dev command]`
## Current State
[Where the project is right now — what works, what doesn't]
## Architecture
[System diagram, key components, data flow]
## Active Work
[What's in progress, what's next, what's blocked]
## Key Decisions & Why
[Decisions that a new developer would question — with the reasoning]
## Gotchas
[Things that will bite you if you don't know about them]
## Who to Ask
[People, channels, or docs for domain-specific questions]
Trigger with: "generate a handoff", "onboard someone to this project", "write a handoff doc"
Context Efficiency Engine
The #1 complaint with Claude Code: sessions hit context limits too fast. You spend half your tokens re-explaining context, and the other half doing actual work. Memory Bank flips this ratio.
The Token Problem (Without Memory Bank)
Session start WITHOUT memory-bank:
User: "Let's continue working on the app"
Claude: "What app? What stack? What were we doing?"
User: "It's a Next.js e-commerce app with Prisma and Stripe..."
[400+ tokens explaining the project]
User: "We were building the checkout flow..."
[300+ tokens explaining current state]
User: "The key files are..."
[200+ tokens listing files]
User: "We decided to use X because..."
[300+ tokens re-explaining decisions]
Total wasted: ~1,200+ tokens EVERY SESSION just to get back to baseline.
Over 10 sessions: ~12,000 tokens wasted on re-explanation alone.
The Token Solution (With Memory Bank)
Session start WITH memory-bank:
Claude reads MEMORY.md: ~800 tokens (compact, structured, complete)
Claude greets with full context: ~150 tokens
User: "Let's go"
Total: ~950 tokens. Savings: 60-80% per session start.
Over 10 sessions: ~9,000+ tokens saved on context alone.
But session-start savings are just the beginning.
Progressive Loading
Don't dump everything into context. Load in tiers:
Tier 1: ALWAYS load (costs ~200 tokens)
└─ Project Overview (1-2 sentences)
└─ Where We Left Off (current task, status, next step)
└─ Active Blockers
Tier 2: Load on DEMAND (costs ~300 tokens when needed)
└─ Key Decisions (only when a decision comes up)
└─ Key Files (only when working with files not in Tier 1)
└─ Architecture Notes (only when touching architecture)
Tier 3: Load ONLY when asked (costs ~200 tokens when needed)
└─ Session Log (only for velocity/history questions)
└─ User Preferences (only on first session or when relevant)
└─ External Context (only when working with APIs/services)
Result: Instead of loading 800 tokens of memory at once, load 200 tokens immediately and the rest only when actually needed. Most sessions never need Tier 3 at all.
Compact Encoding Rules
Every line in MEMORY.md is optimized for maximum information per token:
Use structured shorthand, not prose:
BAD (38 tokens):
"We made the decision to use Prisma as our ORM instead of Drizzle
because it provides better TypeScript type inference and the team
is already familiar with it from previous projects."
GOOD (14 tokens):
| 2025-04-01 | Prisma over Drizzle | Type inference, team familiarity | All DB |
Use tables for structured data (they compress well):
BAD (scattered prose — 120 tokens for 5 files):
The main checkout route is in src/app/api/checkout/route.ts. The Stripe
client is configured in src/lib/stripe.ts. Cart state management is in...
GOOD (table — 60 tokens for 5 files):
| File | Purpose |
| src/app/api/checkout/route.ts | Stripe session creation |
| src/lib/stripe.ts | Stripe client singleton |
| src/stores/cart.ts | Zustand cart + persistence |
Use checklists for active work (scannable, dense):
BAD (prose):
We are currently working on the webhook handler, which is partially
complete. We also need to write tests and haven't started yet.
GOOD (checklist):
- [x] Stripe webhook handler — handlePaymentSuccess()
- [ ] handleRefund() — stubbed, needs implementation
- [ ] Integration tests for webhook endpoints
One line, one fact. No filler words:
BAD: "The project is essentially a web application that was built for..."
GOOD: "Bakery e-commerce. Next.js 14, Prisma, Stripe. Launching April."
Context Budget Tracking
Monitor token usage and warn before hitting limits:
At session start, estimate the context budget:
Available context: ~200,000 tokens (Claude's window)
Memory load: ~800 tokens (Tier 1 + loaded Tiers)
System prompt: ~2,000 tokens
Remaining for work: ~197,200 tokens
At 40% usage (~80,000 tokens consumed):
→ Suggest: "We're at 40% context. Consider compacting soon."
At 60% usage (~120,000 tokens consumed):
→ Save a session checkpoint automatically
→ Suggest: "Context at 60%. Good time to /compact or start fresh."
At 80% usage (~160,000 tokens consumed):
→ Auto-save full state to MEMORY.md
→ Alert: "Context is at 80%. Saving state now — you can continue
in a new session with zero loss. Say 'wrap up' or keep going."
Session Continuation Protocol
When a session hits context limits or user wants to start fresh:
Step 1: EMERGENCY SAVE (before context dies)
└─ Write MEMORY.md with EVERYTHING from current session
└─ Include exact cursor position: file, function, line number
└─ Include any uncommitted mental model (what Claude was thinking)
└─ Include partial work state: what's done, what's half-done, what's next
Step 2: Write CONTINUATION.md (a one-shot warm-up file)
└─ Ultra-compact: under 50 lines, under 500 tokens
└─ Contains ONLY what the next session needs to start immediately
└─ Format:
```markdown
# Continue: [task name]
Resume from: `src/auth/refresh.ts:47` — writing rotateToken()
## State
- handlePaymentSuccess(): DONE ✓
- handleRefund(): stubbed at line 89, needs Stripe refund.created event
- Tests: NOT STARTED
## Context
- Stripe webhook sig verified in middleware (line 12)
- Using stripe.webhooks.constructEvent() not manual HMAC
- Refund handler follows same pattern as payment handler
## Immediate Next Action
Implement handleRefund() in src/api/webhooks/stripe/route.ts:89
using the stripe.refund.created event payload. Pattern:
extract refund.payment_intent → find order → update status to "refunded"
Step 3: GREET AND GO (next session) └─ Read CONTINUATION.md first (it's the fast-path) └─ Read MEMORY.md for full context only if needed └─ Delete CONTINUATION.md after loading └─ Start working immediately — no questions, no warm-up
**Trigger phrases:** "save state", "I'm running out of context",
"continue this later", "session is getting long"
### Token Savings By Feature
| Feature | Tokens Saved Per Session | How |
|---------|------------------------|-----|
| Structured memory vs re-explaining | 800-1,500 | Compact format replaces verbal explanation |
| Progressive loading (Tier 1 only) | 300-600 | Don't load what you don't need |
| Compact encoding (tables > prose) | 200-400 | Same info, fewer tokens |
| Session continuation protocol | 500-1,000 | Zero warm-up in new sessions |
| Smart compression | 200-500 | Smaller file = fewer tokens to read |
| Branch-aware selective loading | 100-300 | Skip irrelevant branch context |
| **Total per session** | **2,100-4,300** | |
| **Over 10 sessions** | **21,000-43,000** | |
### Anti-Patterns That Waste Tokens
**Never do these in memory files:**
✗ Verbose prose where a table works ✗ Repeating the same information in multiple sections ✗ Storing code snippets in memory (reference file:line instead) ✗ Long descriptions of completed work (one-line summaries only) ✗ Keeping resolved blockers (delete them) ✗ Storing information that's in README.md or CLAUDE.md already ✗ Using memory for things Git tracks (commit history, diffs, blame)
**Always do these:**
✓ Tables for structured data (decisions, files, tasks) ✓ Checklists for active work ✓ One sentence for Project Overview (not a paragraph) ✓ File:line references instead of describing code ✓ Delete resolved items (they're in git history) ✓ Reference other files instead of duplicating content
> See `references/context-efficiency.md` for the full token optimization guide.
---
## Rules for Excellent Memory
**Be surgical, not vague.**
Bad: "Working on auth"
Good: "Implementing JWT refresh token rotation in `src/auth/refresh.ts` —
`rotateToken()` is complete, needs Redis TTL logic in `src/cache/tokens.ts:47`"
**The "Next immediate step" is the single most important line.**
It should be so precise that Claude can start coding the instant a session
begins, with zero clarifying questions.
**Capture the "why" behind every decision.**
Future Claude will encounter the same trade-offs and re-litigate them
unless the reasoning is recorded.
**Never store secrets.** No API keys, passwords, tokens, or credentials.
Ever. Not even "temporarily". Reference `.env` or a secrets manager instead.
**Overwrite on session end, surgical update mid-session.**
Session end = full rewrite for consistency. Mid-session = targeted section
updates to avoid losing context.
**Keep it under 150 lines.** Compress aggressively. Stale information is
actively harmful — it misleads more than it helps.
---
## Auto-Setup via CLAUDE.md
For fully automatic memory with all features, add to project `CLAUDE.md`
(or `~/.claude/CLAUDE.md` for all projects):
```markdown
## Memory
At the start of every session:
1. Check for MEMORY.md in the project root
2. Check for ~/.claude/GLOBAL-MEMORY.md
3. Check current git branch and look for .memory/branches/<branch>.md
4. Run session diff — what changed since last memory update
5. Score memory health and flag any issues
6. Greet me with a specific summary and the next immediate step
During sessions:
- Update memory when I say "remember this" or complete a milestone
- Track key decisions with reasoning in the decision table
At session end (when I say "wrap up", "save", "done for now"):
1. Write comprehensive MEMORY.md with full current state
2. Ensure "Next immediate step" is crystal clear
3. Run compression if over 150 lines
4. Confirm what was saved
See
references/claude-md-integration.mdfor the full integration guide.
Reference Files
references/memory-layers.md— Full architecture of the 3-tier memory system with promotion rules and cross-layer interactionsreferences/branch-aware-memory.md— Git branch integration, overlay merging, and cleanup strategiesreferences/smart-compression.md— Compression algorithm, archival thresholds, and what to never compressreferences/session-diffing.md— Cross-session change detection, conflict resolution, and drift correctionreferences/advanced-patterns.md— Team workflows, velocity tracking, handoff protocol, and enterprise patternsreferences/context-efficiency.md— Token optimization guide, progressive loading details, compact encoding referencereferences/claude-md-integration.md— Complete setup guide for automatic triggering across all projects
Examples
examples/solo-fullstack.md— Memory for a solo developer on a Next.js appexamples/team-backend.md— Team-shared memory for a backend serviceexamples/monorepo.md— Multi-domain memory for a monorepoexamples/minimal.md— 5-line memory for quick prototypes