Understanding Memory in Humans, LLMs, and How mem0 Helps

Sharad SinghSharad Singh
6 min read

Let’s begin with a simple question:

What is memory?

At its core, memory is the ability to retain information and recall it when required. For humans, this means remembering past experiences or facts. For Large Language Models (LLMs), “memory” works differently, it’s essentially the ability to use previous messages in a conversation as context to generate meaningful responses.

How Memory Works in LLMs

Unlike humans, LLMs cannot “remember” things across different sessions. Here’s why:

  • LLMs don’t store memory natively : Every time you send a query, you must also provide the conversation history (previous messages). The LLM uses this context to generate an answer.

  • More messages = slower responses : As the conversation grows longer, more tokens are needed, making the model slower and costlier to run.

  • No persistence across sessions : If you start a new chat, the LLM won’t recall previous discussions unless you manually provide context.

  • Token limits exist : Each session has a token cap (typically in the order of thousands to a few hundred thousand). Once exceeded, older context must be trimmed.

Clearly, we need a more optimized memory system for LLMs. To understand this, let’s first look at how human memory works.

Human Memory Types and Their AI Parallel

The human brain has different types of memory. Interestingly, these concepts can be mapped to LLM memory as well.

  1. Short-Term Memory

    • Humans: Retains information for minutes, hours, or a few days (e.g., remembering the price of coffee you just bought).

    • LLMs: Session-based memory, only remembers messages within a single session.

  2. Long-Term Memory

    • Humans: Stores knowledge for years (e.g., your best friend’s birthday).

    • LLMs: Can be designed to remember user history across multiple sessions, improving personalization and user experience.

  3. Episodic Memory

    • Humans: Memory based on personal experiences or events (e.g., your last summer trip).

    • LLMs: Agents can retain access to logs, events, or past interactions for more contextual responses.

  4. Semantic Memory

    • Humans: General facts and knowledge (e.g., 2+2=4, the capital of India is New Delhi).

    • LLMs: AI assistants use structured knowledge bases (like legal databases or encyclopedias) to provide accurate answers.

This parallel shows how AI agents can simulate different types of memory by combining databases, embeddings, and structured knowledge.

Facts and Relations in AI Memory

Instead of storing entire conversation transcripts, we can extract facts and relationships from messages and store them efficiently.

Example Conversation:

Message:
“My name is Elon Musk. I founded SpaceX in 2002 in California. The company launched its first rocket Falcon 1 in 2008. Later, NASA signed a contract with SpaceX in 2010 to deliver cargo to the ISS. In 2020, SpaceX became the first private company to send astronauts to space aboard the Crew Dragon capsule.”

Extracted Facts:

  • Person → Elon Musk

  • Company → SpaceX

  • Founded → 2002

  • Location → California

  • Rocket → Falcon 1

  • First Launch → 2008

  • Organization → NASA

  • Contract Year → 2010

  • Mission → Deliver cargo to ISS

  • Event → First private company to send astronauts

  • Year → 2020

  • Spacecraft → Crew Dragon

Instead of storing the full paragraph, we save only the facts.

Now if the user asks:
“What is my name?”

The system retrieves from stored facts:
Answer: “Your name is Elon Musk.”

Extracted Relations:

We can also capture relationships between entities:

  • Elon Musk → Founded → SpaceX (2002, California)

  • SpaceX → Launched → Falcon 1 (2008)

  • NASA → Signed Contract With → SpaceX (2010, Cargo to ISS)

  • SpaceX → Sent Astronauts → Crew Dragon (2020)

These can be stored in a graph database. Now, if asked:
“What is my name and am I a founding member of any company?”

Using facts + relations, the LLM can answer:

  • “Your name is Elon Musk.”

  • “Yes, you are the founder of SpaceX.”

This approach helps in efficient context management, instead of overloading the LLM with raw text, we provide structured facts and relationships.

Why Not Build This From Scratch?

Implementing this manually is complex:

  • Setting up a vector database to store embeddings.

  • Maintaining a graph database for relationships.

  • Running Named Entity Recognition (NER) + Relation Extraction.

  • Constantly syncing facts and relations across sessions.

It’s doable, but tedious. And GenAI was not meant to reinvent the wheel every time.

mem0 : Memory for LLMs

mem0 is a specialized tool designed to manage LLM memory seamlessly. It abstracts away the heavy lifting:

  • Stores facts, embeddings, and relationships automatically.

  • Integrates with vector and graph databases without custom setup.

  • Optimizes memory retrieval so the LLM only gets the most relevant context.

  • Supports multi-session memory, enabling true long-term conversations.

With mem0, you don’t need to reinvent complex infrastructure. Instead, you focus on building better AI assistants while mem0 ensures your LLM has a reliable memory system.

Let’s learn it with the Code:

If you have at least a basic knowledge of Docker and Python (including virtual environments), you should be able to execute the code below easily. However, if you need any assistance, feel free to comment with your doubts, and I’ll be happy to help.

Docker Containers:

  • We are going to use QdrantDB as vector database and Neo4j as graph database.

  • Setup vector database and graph database with the username and password.

docker pull qdrant/qdrant
docker run -d -p 6333:6333 qdrant/qdrant  # This will run a qdrant db container on localhost:6333

docker pull neo4j:5.26.11-ubi9
docker run \
    --publish=7474:7474 --publish=7687:7687 \
    --volume=$HOME/neo4j/data:/data \
    neo4j

Qdrant UI:

Neo4j UI:

.env:

OPENAI_API_KEY=<your openai key>

Python Packages:

python-dotenv
mem0
neo4j
qdrant-client
openai

mem0_conf.py:

import os
from dotenv import load_dotenv
load_dotenv()

OPENAI_KEY = os.getenv("OPENAI_API_KEY")

memory_config={
    "version":"v1.1",
    "embedder":{
        "provider":"openai",
        "config":{
            "api_key":OPENAI_KEY,
           "model": "text-embedding-3-small"
        }
    },
     "llm": {"provider": "openai", "config": {"api_key": OPENAI_KEY, "model": "gpt-4.1"}},
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "host": "localhost",
            "port": "6333"
        }
    },
    "graph_store": {
        "provider": "neo4j",
        "config": {
            "url": "neo4j://localhost:7687",
            "username": "neo4j",
            "password": "sharad123"
        }
    },
}

main.py:

from dotenv import load_dotenv
from mem0 import Memory
import os
from openai import OpenAI
import json
from memo_config import memory_config
load_dotenv()

client= OpenAI()

mem_client = Memory.from_config(memory_config)

def fun():
    while True:
        query= input("Ask anything...\n")
        our_memories= mem_client.search(query=query,user_id="testing-user-1") # user_id is unique id on which we can store our embeddings and relation graph.

        stored_memory= [ f"ID: {mem.get("id")}, Memory: {mem.get("memory")}" for mem in our_memories.get("results") ]

        SYSTEM_PROMPT = f"""
            Hi, you are a memory assistance expert who captures the context of memory in the form of facts and relations, 
            and based on that, you provide accurate answers to queries.
            Memory of the user:
            {json.dumps(stored_memory)}
        """

        response= client.chat.completions.create(
            model="gpt-4.1",
            messages=[{"role":"system","content":SYSTEM_PROMPT},
                      {"role":"user","content":query}
                      ]
        )

        print(f"{response.choices[0].message.content}")
        mem_client.add([
            {"role":"user","content":query},
            {"role":"assistant","content":response.choices[0].message.content}
        ],user_id="testing-user-1")  # saving our facts and relations to the same user_id, so that later we can extract these facts and relations.


fun()

Output : Session 1

Output: Session 2

As you can see, embeddings and relations are being added to testing-user-1. Using this ID, information is then extracted from both the vector database and the graph database, which is combined to form a response that is returned to us.

Embeddings Details in vector database:

Relation Graph:

Conclusion

Human memory has multiple layers: short-term, long-term, episodic, semantic and these concepts inspire how we design memory for AI agents. By storing facts and relationships efficiently, we can make LLMs smarter, faster, and context-aware.

Instead of building memory management from scratch, tools like mem0 make it simple to integrate structured memory into your AI systems, bringing us one step closer to AI that truly “remembers.”

Please check out my other learnings and posts on GenAI concepts:

0
Subscribe to my newsletter

Read articles from Sharad Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sharad Singh
Sharad Singh