AI Engineering🤖 AI Agents🧩 Memory Systems
🛡️
Running production systems? Exemplar brings SRE, uptime monitoring, and incident management together so your team resolves outages faster and proves reliability to the business. Visit exemplar.dev →

🧩 Agent Memory Systems

Memory gives an AI agent continuity and learning capacity. Without memory, every interaction is stateless.

To build systems that adapt to user preferences and retain context, you must design a structured memory system.


🗂️ The Four Types of Agent Memory

Agent architectures split memory into four distinct conceptual systems:

Memory CategoryTypeStorage MechanismUse Case / Description
Short-Term📝 In-Context MemoryPrompt / LLM Context WindowMaintaining active conversation history and instructions.
Long-Term🗄️ External MemoryVector Databases / SQL StoresStoring and querying large corpora of external knowledge.
Long-Term🎞️ Episodic MemoryVector DB / Structured LogsRetaining past execution traces and specific task interactions.
Long-Term🧠 Semantic MemoryGraph DBs / User ProfilesRetaining generalized facts, rules, and user preferences.

1. 📝 In-Context Memory (Working Memory)

Information currently in the LLM’s active prompt window, including chat history, system instructions, and retrieved documents.

  • Pros: Fast, high-fidelity, immediately accessible to the model.
  • Cons: Limited by context window size and latency (larger context = slower inference).
  • Use Case: Current chat turn-by-turn history.

2. 🗄️ External Memory (Long-Term Memory)

Stored in databases or vector search indices outside of the model. The agent queries this memory during execution.

  • Implementation: Querying index representations in Vector DBs.
  • Use Case: Accessing large custom document catalogs.

3. 🎞️ Episodic Memory

Records specific instances of past experiences or actions.

  • Mechanism: Storing execution traces, user inputs, and final outcomes as structured objects or semantic embeddings.
  • Use Case: Repeating a successful complex workflow based on history.

4. 🧠 Semantic Memory

Stores generalized facts, rules, profiles, and concepts.

  • Mechanism: Running offline background tasks to extract facts from episodic history and update a user profile profile or knowledge graph.
  • Use Case: Customizing recommendations based on user preferences.

🔄 Memory Architecture Patterns

Pattern 1: RAG-based Memory Retrieval

Embed the user query, search a vector store of historical conversations, and pull in the top-K relevant messages.

[New Input] ──► Generate Embeddings ──► Search Vector DB ──► Top-K Results ──► Inject Context

Pattern 2: Sliding Window Memory

Maintain a sliding window of the last N messages or tokens, discarding older messages.

def trim_conversation_history(messages: list, max_tokens: int, token_counter_fn) -> list:
    """
    Trims history to stay within token bounds, preserving the system prompt.
    """
    system_prompt = messages[0] if messages and messages[0]["role"] == "system" else None
    active_history = messages[1:] if system_prompt else messages
 
    while active_history and token_counter_fn(active_history) > max_tokens:
        active_history.pop(0)  # Remove oldest message
 
    return [system_prompt] + active_history if system_prompt else active_history

Pattern 3: Hierarchical Summarization Memory

Summarize old parts of the conversation dynamically. The active context contains a rolling summary of the past + full details of recent messages.

[System Prompt] + [Rolling Summary of turns 1-10] + [Raw Messages of turns 11-15]

🏆 Memory System Best Practices

  1. Be Selective: Do not write every interaction to long-term memory. Noise pollutes semantic retrieval.
  2. Utilize Metadata: Tag memories with timestamps, user IDs, and topic categories to allow SQL pre-filtering.
  3. Implement Decay (TTL): Implement a decay factor or Time-To-Live (TTL) for stale episodic memories.
  4. Privacy: Encrypt personal identifiers (PII) before committing conversations to shared vector indexes.

🚀 10K+ page views in last 7 days
Developer Handbook 2026 © Exemplar.