🧩 Agent Memory Systems
Memory gives an AI agent continuity and learning capacity. Without memory, every interaction is stateless.
To build systems that adapt to user preferences and retain context, you must design a structured memory system.
🗂️ The Four Types of Agent Memory
Agent architectures split memory into four distinct conceptual systems:
| Memory Category | Type | Storage Mechanism | Use Case / Description |
|---|---|---|---|
| Short-Term | 📝 In-Context Memory | Prompt / LLM Context Window | Maintaining active conversation history and instructions. |
| Long-Term | 🗄️ External Memory | Vector Databases / SQL Stores | Storing and querying large corpora of external knowledge. |
| Long-Term | 🎞️ Episodic Memory | Vector DB / Structured Logs | Retaining past execution traces and specific task interactions. |
| Long-Term | 🧠 Semantic Memory | Graph DBs / User Profiles | Retaining generalized facts, rules, and user preferences. |
1. 📝 In-Context Memory (Working Memory)
Information currently in the LLM’s active prompt window, including chat history, system instructions, and retrieved documents.
- Pros: Fast, high-fidelity, immediately accessible to the model.
- Cons: Limited by context window size and latency (larger context = slower inference).
- Use Case: Current chat turn-by-turn history.
2. 🗄️ External Memory (Long-Term Memory)
Stored in databases or vector search indices outside of the model. The agent queries this memory during execution.
- Implementation: Querying index representations in Vector DBs.
- Use Case: Accessing large custom document catalogs.
3. 🎞️ Episodic Memory
Records specific instances of past experiences or actions.
- Mechanism: Storing execution traces, user inputs, and final outcomes as structured objects or semantic embeddings.
- Use Case: Repeating a successful complex workflow based on history.
4. 🧠 Semantic Memory
Stores generalized facts, rules, profiles, and concepts.
- Mechanism: Running offline background tasks to extract facts from episodic history and update a user profile profile or knowledge graph.
- Use Case: Customizing recommendations based on user preferences.
🔄 Memory Architecture Patterns
Pattern 1: RAG-based Memory Retrieval
Embed the user query, search a vector store of historical conversations, and pull in the top-K relevant messages.
[New Input] ──► Generate Embeddings ──► Search Vector DB ──► Top-K Results ──► Inject ContextPattern 2: Sliding Window Memory
Maintain a sliding window of the last N messages or tokens, discarding older messages.
def trim_conversation_history(messages: list, max_tokens: int, token_counter_fn) -> list:
"""
Trims history to stay within token bounds, preserving the system prompt.
"""
system_prompt = messages[0] if messages and messages[0]["role"] == "system" else None
active_history = messages[1:] if system_prompt else messages
while active_history and token_counter_fn(active_history) > max_tokens:
active_history.pop(0) # Remove oldest message
return [system_prompt] + active_history if system_prompt else active_historyPattern 3: Hierarchical Summarization Memory
Summarize old parts of the conversation dynamically. The active context contains a rolling summary of the past + full details of recent messages.
[System Prompt] + [Rolling Summary of turns 1-10] + [Raw Messages of turns 11-15]🏆 Memory System Best Practices
- Be Selective: Do not write every interaction to long-term memory. Noise pollutes semantic retrieval.
- Utilize Metadata: Tag memories with timestamps, user IDs, and topic categories to allow SQL pre-filtering.
- Implement Decay (TTL): Implement a decay factor or Time-To-Live (TTL) for stale episodic memories.
- Privacy: Encrypt personal identifiers (PII) before committing conversations to shared vector indexes.