Agents : Memory

Implementing Long-Term Memory in AI Agents (Semantic, Episodic, Procedural) with LangMem
AI agents powered by large language models (LLMs) can appear more intelligent and personalized when they remember information over time. By equipping agents with long-term memory, developers enable them to recall facts, past interactions, and learned skills beyond a single chat session. In this article, we’ll conduct a deep dive into the role of memory in AI agents, focusing on the three key types of memory – semantic, episodic, and procedural – and how to implement each.
We’ll explore conceptual differences between these memory types, practical strategies for integrating memory into AI systems, and specific techniques using the LangMem framework (a toolkit for long-term memory in LangChain). We’ll also discuss optimization techniques like delayed memory processing, dynamic namespaces, efficient retrieval, and the performance trade-offs involved in giving your AI a long-term memory.
Conceptual Understanding: Memory Types in AI Agents
Memory Type | Purpose | Agent Example | Human Example | Typical Storage Pattern |
Semantic | Facts & Knowledge | User preferences; knowledge triplets | Knowing Python is a programming language | Profile or Collection |
Episodic | Past Experiences | Few-shot examples; Summaries of past conversations | Remembering your first day at work | Collection |
Procedural | System Behavior | Core personality and response patterns | Knowing how to ride a bicycle | Prompt rules or Collection |
Source : https://blog.langchain.dev/langmem-sdk-launch/
Just as humans have multiple forms of memory, AI agents can benefit from different memory types for different purposes . In cognitive terms, we can draw analogies to human memory when designing AI agent memory:
• Semantic Memory (Facts & Knowledge): Semantic memory is about storing factual information or general knowledge an agent has learned . In humans, this is like remembering that Paris is the capital of France or Python is a programming language. For AI, semantic memory might include facts about the user or world that were learned during interactions or provided as data (e.g. a user’s name, preferences, key domain facts) . This memory enables the agent to ground its responses with correct details and personalization. Example: A virtual assistant’s semantic memory could store that the user’s favorite cuisine is Italian and their birthday is July 20th, so it can later recommend Italian restaurants or send birthday wishes.
• Episodic Memory (Events & Experiences): Episodic memory records specific experiences or past events . For humans, episodic memories might be recollections of your first day at work or a memorable trip. In AI agents, episodic memory means remembering past dialogues or problem-solving episodes – essentially the agent’s own experiences in dealing with certain situations . This could include summaries of previous conversations, successful task outcomes, or mistakes made, along with the context in which they occurred.
Episodic memory helps the agent recall “how did I handle this before?” and apply that experience to guide future behavior . Example: A customer support chatbot’s episodic memory might include a summary of the last support session with a user, so if the user returns, the bot remembers what was tried before and what the result was.
• Procedural Memory (Skills & Behaviors): Procedural memory captures the know-how for performing tasks, encompassing rules, skills, or policies the agent follows . In humans, this is like the ingrained skill of riding a bicycle or playing the piano – you might not recall a specific event, but you have internalized how to do it. For AI agents, procedural memory manifests in the agent’s core behavior: it can be encoded in the model’s weights, in the agent’s code, or importantly in the system prompts and instructions that guide its responses . By updating its procedural memory, an agent can learn new behaviors or refine its style over time without changing its underlying model weights . Example: An AI coding assistant might learn over time to adopt a more detailed code commenting style after it consistently gets user feedback asking for more explanation. This learned behavior is stored as an adjustment to its system prompt (procedural memory), so future code outputs include better comments by default.
Semantic memory gives the agent a knowledge base of facts (the “**what”)
Episodic memory provides it with personal experiences (the “when and how” of past events)
Procedural memory governs its inherent skills or behaviors (the “how to do” rules).
In practice, an AI agent will typically use a combination of all three to achieve more intelligent and personalized interactions. For example, a sophisticated personal assistant might use semantic memory to recall a user’s preferences, episodic memory to remember the context of previous conversations with that user, and procedural memory to adapt its tone or strategy based on what has been effective in the past.
Implementation Strategies for AI Agent Memory
How can developers equip AI agents with these forms of memory? Implementing memory in AI systems involves deciding what information to store, how to store it, and when to retrieve or update it during conversations . Below, we outline strategies for integrating each type of memory into an AI agent, along with real-world use cases and best practices. There are implementations using Langmem , do checkout some examples here : https://github.com/linux-devil/llm_learning/tree/main/agent_memory
Semantic Memory Implementation (Facts & Knowledge)
To give an AI agent semantic memory, you need a mechanism to capture facts or details and store them in a retrievable format. A common strategy is to use a knowledge base or database (often vector databases for semantic search) to store facts extracted from interactions:
• Extracting Facts: After or during a conversation, you can run a process to extract key facts or data points that emerged. This could be done by calling an LLM to summarize the conversation or pull out structured facts (e.g. in JSON). For example, if a user mentions their birthday or a new preference, the agent should record that fact. Many developers implement this by writing a prompt like “List any new facts the user stated about themselves” and having the LLM output those facts for storage.
To use it, you create a memory manager specifying the LLM and optionally a schema for the information you want to extract. For example, if you want to store user preferences as facts, you might define a Pydantic model for a UserPreference or UserProfile and pass that schema to the manager. Below is an example of setting up a memory manager to extract a user’s profile information (name, preferred name, style, skills, etc.) from a conversation:
from pydantic import BaseModel, Field
from langmem import create_memory_manager
class UserProfile(BaseModel):
"""Save the user's preferences and traits."""
name: str
preferred_name: str
response_style_preference: str
special_skills: list[str]
other_preferences: list[str]
manager = create_memory_manager(
"anthropic:claude-3-5", # LLM to use for extraction
schemas=[UserProfile],
instructions="Extract user preferences and settings",
enable_inserts=False # We'll update one profile document
)
# Assume we have a conversation list of messages (user and assistant turns)
profile_memories = manager.invoke({"messages": conversation})
profile = profile_memories[0].content # This is a UserProfile object
print(profile)
In this snippet, the memory manager is instructed to extract the user’s preferences and settings from a conversation. When invoke is called with the conversation, the LLM processes it according to the schema and instructions, returning a UserProfile object filled with information gleaned from the dialogue . For example, if the user said “Hi! I’m Alex but please call me Lex. I’m a wizard at Python and love making AI systems that don’t sound like boring corporate robots…” and so on, the memory manager might produce a profile like:
name="Alex",
preferred_name="Lex",
response_style_preference="casual and witty with appropriate emojis",
special_skills=["Python programming", "AI development", "competitive speedcubing"],
other_preferences=["prefers informal communication", "dislikes corporate-style interactions"]
• Retrieving and Using Facts: When the agent is generating a response, relevant facts from semantic memory should be retrieved and injected into the prompt (often as part of the system message or additional context). This retrieval is typically done via semantic search: comparing the current query or conversation context with stored memory embeddings to find related info . For instance, if the user asks “Can you recommend a restaurant for tonight?”, the agent might query the memory store for the user’s known food preferences or past restaurant conversations, then use those facts (e.g. “user likes Italian”) to tailor its answer. Efficient retrieval may involve filtering by user or category and then ranking by similarity to ensure only the most relevant facts are included.
Episodic Memory Implementation (Past Experiences)
Episodic memory in AI agents refers to preserving the context and outcomes of past interactions – essentially, remembering stories or episodes from the agent’s experience. Implementing episodic memory often involves capturing conversation transcripts or distilled “experience logs” that the agent can refer to in the future:
• Capturing Episodes: Not every interaction needs to become an episodic memory. A typical approach is to store notable interactions, such as successful problem-solving sessions, important user interactions, or failures that the agent should learn from. One way to do this is by summarizing entire conversations or critical segments into a concise narrative. For instance, after a support chat that ends with a satisfied user, the system can summarize: “User had issue X, agent walked them through steps Y, issue resolved and user was happy” – this summary becomes an episodic memory. Some frameworks treat this like an “experience replay” concept, similar to reinforcement learning, where the agent writes down the key situation, the actions it took, and the result .
class Episode(BaseModel): #
"""Write the episode from the perspective of the agent within it. Use the benefit of hindsight to record the memory, saving the agent's key internal thought process so it can learn over time."""
observation: str = Field(..., description="The context and setup - what happened")
thoughts: str = Field(
...,
description="Internal reasoning process and observations of the agent in the episode that let it arrive"
' at the correct action and result. "I ..."',
)
action: str = Field(
...,
description="What was done, how, and in what format. (Include whatever is salient to the success of the action). I ..",
)
result: str = Field(
...,
description="Outcome and retrospective. What did you do well? What could you do better next time? I ...",
)
# The Episode schema becomes part of the memory manager's prompt,
# helping it extract complete reasoning chains that guide future responses
manager = create_memory_manager(
"openai:gpt-4o-mini",
schemas=[Episode],
instructions="Extract examples of successful explanations, capturing the full chain of reasoning. Be concise in your explanations and precise in the logic of your reasoning.",
enable_inserts=True,
)
# Configure memory manager with storage
manager = create_memory_store_manager(
"openai:gpt-4o-mini",
namespace=("memories", "episodes"),
schemas=[Episode],
instructions="Extract exceptional examples of noteworthy problem-solving scenarios, including what made them effective.",
enable_inserts=True,
)
• Structured Experience Logs: It can help to structure episodic memories for consistency. For example, you might define an Episode record with fields such as situation/context, action_taken (or chain of thought), and outcome . This is akin to a journal entry that explains how the agent approached a problem and what happened. By storing the agent’s internal reasoning along with the outcome, the agent can later analyze what strategies work well. This technique was demonstrated by the LangMem framework, which allows defining a Pydantic schema for an Episode and using an LLM to fill it out after an interaction . The episodic memory entry essentially captures “how did I get to a good answer and what was the result?” in a given scenario.
• Using Episodes for Learning: When faced with a new task, an agent can retrieve relevant episodes to guide its behavior. This often takes the form of few-shot prompting: supplying the model with examples from past episodes that are similar to the current context . Typically, episodic memories are retrieved by similarity (e.g., via an embedding search on the episode descriptions) or by tagging (metadata indicating the type of scenario). Including a highly relevant past example in the prompt can significantly improve performance on tasks that are similar to what the agent has seen before.
Real-world use case: Consider an AI tutoring system that helps students solve math problems. It can maintain episodic memories of each tutoring session: what problems were attempted, which hints helped, and where the student struggled. Later, if the same student (or even another student) faces a similar problem, the agent can recall the episode to avoid ineffective hints and apply strategies that proved successful. Another example: in AI game agents, episodic memory could be used to remember sequences of moves that led to victory or defeat, informing future decision-making.
Procedural Memory Implementation (Skills & Behaviors)
Procedural memory is about how an agent does things – its ingrained skills or policies. Implementing procedural memory in AI agents often means enabling the agent to learn from feedback and change its core behavior (usually its system prompt or internal rules) over time . Unlike semantic or episodic memory which store content to inject, procedural memory updates how the agent formulates responses. Here are strategies to implement it:
• Prompt Refinement (“Self-Improvement”): A straightforward way for an agent to adjust its behavior is to refine its system prompt or instructions based on experience. For example, suppose an agent consistently gets feedback that its answers are too verbose. The developer could manually edit the system prompt to say “be concise”, but with procedural memory, the agent can learn this itself. One technique is reflective prompt optimization – after some interactions, run a process where the agent (or a separate LLM) reviews conversation transcripts and feedback to propose an updated prompt or new rules for itself. This is sometimes called meta-prompting or the “reflect, then improve” approach . The agent essentially asks “How can I do better next time?” and writes an improved instruction set.
• Feedback Loops: To know how to change its behavior, the agent needs feedback or a measure of success. This can come from explicit user feedback (thumbs-up/down, corrections) or an automated evaluation of the agent’s responses (a reward score). Developers can implement a loop where after each interaction (or batch of interactions), if the outcome was poor, the agent’s procedural memory is updated. For instance, an agent that failed a task might add a rule “If the user asks X, remember to do Y first” to avoid repeating the mistake. If it succeeded, it might strengthen the behavior that led to success (like “Always double-check the user’s question for ambiguities”). This is analogous to Reinforcement Learning from Human Feedback (RLHF), but can be done on the fly with prompt engineering rather than weight tuning – by adding or adjusting instructions.
Real-world use case: An AI content generator might notice that its user prefers outputs in a certain style (e.g. with more humor). Procedural memory allows it to internalize this style guideline. Over time and with subtle feedback (the user tends to regenerate content until a humorous one appears, etc.), the agent can adapt its core writing style to be more humorous by default, without being explicitly told on each request. Another scenario: a multi-step task-solving agent (like those using ReAct or tool use) can refine its approach to using tools. If it observes that a particular sequence of tool uses leads to better outcomes (say, always do a web search before answering a question about current events), it can incorporate that as a new step in its policy through procedural memory.
Using the LangMem Framework for AI Agent Memory
Manually implementing the above memory mechanisms can be complex, but there are frameworks designed to simplify this. LangMem is a recently introduced SDK specifically for managing long-term memory in AI agents . It is part of the LangChain ecosystem and provides out-of-the-box tools to handle semantic, episodic, and procedural memory for agents. Let’s explore how LangMem supports these memory types and how developers can use it in practice.
Memory Tools and Integration
Beyond the core APIs, LangMem also provides memory tools that integrate into an agent’s reasoning loop. If you’re using a LangChain agent (like a ReAct agent), you can give the agent tools such as ManageMemory and SearchMemory so that the agent can decide when to store or retrieve memories during conversation . For example:
• create_manage_memory_tool(namespace=...) creates a tool that the agent can invoke to save new information. The agent would invoke this when it deems something worth remembering (perhaps guided by its prompt or chain logic).
• create_search_memory_tool(namespace=...) allows the agent to query its long-term memory. The agent might use this tool when it faces a question and wants to see if it “knows” something from before.
Summary
In summary, adding long-term memory to AI agents is extremely promising for creating more capable and personalized systems, but it requires careful thought to get right. By understanding the types of memory (semantic for facts, episodic for experiences, procedural for skills), using frameworks like LangMem to implement them, and applying optimization techniques, developers can build agents that learn and improve over time. The trade-offs can be managed by smart design: isolate and target what to remember, update memories at the right moments, and retrieve information efficiently. With these practices, you can give your AI agent a robust memory that significantly enhances its performance and user experience, while keeping the system scalable and cost-effective.
Subscribe to my newsletter
Read articles from Harshit Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
