Exploring AI Memory Architectures (Part 3): From Prototype to Blueprint

1. Connecting Blueprints and Prototypes
In the first two posts, we saw an idea evolve from a micro-architecture to a macro-system. Memory³ was like a detailed chip schematic, defining an efficient "memory circuit." MemOS was like a grand urban plan, envisioning a "memory governance framework" for an entire AI system. Together, they form a forward-looking architectural blueprint.
Meanwhile, in the world of engineering, developers have built something else entirely. Open-source libraries like mem0 and LangMem are engineering prototypes. They are pragmatic, usable, and provide plug-and-play memory for countless AI apps today. The goal of this final post is to connect the blueprint and the prototype. We'll look at the limits of today's prototypes, see what the blueprint can teach us, and map a practical path from here to there.
2. Today's Prototypes: The Case of mem0
The core idea behind today's memory libraries can be summarized as an intelligent RAG layer. They solve one key problem: giving stateless LLMs a persistent memory that lasts across sessions.
The workflow is usually:
Capture: Listen to the conversation between the user and the agent.
Filter & Store: Use a model (often an LLM itself) to decide what's important, then vectorize it and save it to a vector database.
Retrieve & Inject: When a new interaction occurs, search the database for the most relevant memories and inject them back into the prompt's context.
This approach is simple, effective, and easy to integrate. But when we look at it through the lens of the MemOS blueprint, we see its limitations.
3. Two Core Limitations of the Prototypes
Limitation 1: Monolithic Memory
The MemOS blueprint divides memory into three tiers: plaintext, activation, and parametric. Today's prototypes essentially only manage vectorized plaintext memory. Every piece of information, regardless of its nature, is processed into a vector representation of a text chunk.
This causes two problems:
No "Activation Memory": They don't use Memory³'s core insight—pre-compiling frequently used knowledge into a KV cache the model can use directly. This means that even if the exact same fact is retrieved 100 times, the model still has to spend cycles "reading" the injected plaintext context every single time. It's suboptimal in both performance and cost.
No "Parametric Memory": They have no way to "harden" core skills or knowledge into a part of the model itself (like a LoRA weight). A user's core preferences, which should become part of the agent's "instincts," remain just an external record that has to be retrieved over and over.
Limitation 2: Primitive Governance
The core of the MemOS blueprint is governance, enabled by the standardized MemCube container and its rich metadata. By comparison, the governance in today's prototypes is rudimentary.
Incomplete Metadata: They might have a timestamp, but they generally lack the systemic metadata framework proposed by MemOS, like
importance_score
,access_control_list
, orversion
. This makes advanced policies like intelligent forgetting, permissioning, or memory attribution difficult to implement.No Lifecycle Management: The dynamic flow of memory between states (caching, hardening, archiving) is the essence of MemOS. In current tools, once a memory is in the vector DB, its form is fixed. There's no mechanism for it to be automatically "promoted" or "demoted" based on usage.
4. The Path Forward: From Memory "Database" to "Runtime"
By understanding these limitations, we can sketch a path for developers to evolve from today's prototypes by gradually absorbing the ideas from the architectural blueprint. The goal is to evolve the AI memory system from an external dependency into a true agent state runtime.
Step 1: Implement Your Own Multi-Level Cache
This is the most direct lesson from Memory³. Add an in-memory cache layer on top of your existing RAG flow.
Level 1 (Hot Data): In memory, use a simple dictionary or LRU cache to store frequently retrieved, already-digested structured knowledge. (e.g., cache the info "my API key is xyz" as
{ "api_key": "xyz" }
). When a request hits the L1 cache, you inject the structured result directly, saving the LLM from re-processing the natural language.Level 2 (Warm Data): For L1 misses, fall back to retrieving plaintext chunks from the vector database. This is how
mem0
works today.
Step 2: Build a Standardized "Memory Object"
This is the lesson from MemOS's MemCube. Stop treating memory as a simple string and wrap it in a standard Memory Object.
@dataclasses.dataclass
class MemoryObject:
content: Any # Can be str, structured data, or a path to LoRA weights
timestamp: float
source: str
importance: float
access_count: int
# ... and other metadata
Once you start organizing memory this way, you open the door to advanced governance, like forgetting policies based on importance and access frequency.
Step 3: Move Towards an "Agent State Runtime"
With a multi-level cache and standardized memory objects, your system is no longer just a memory library. It's becoming a true Runtime. This runtime is the agent's brain and central nervous system. It is responsible for:
Unified State Management: Conversation history, user preferences, documents, and tool states are all managed as standard memory objects.
Orchestrating Memory Flow: The runtime becomes an active manager, automatically moving memory between cache tiers and persistent storage based on metadata.
Plugging into Memory Backends: In the future, this runtime could connect to more diverse backends—a Memory³-native model (for KV injection), a fine-tuning service (for parametric hardening)—creating a pluggable, extensible memory architecture.
5. Conclusion: Memory as a First-Class Citizen
From the architectural cleverness of Memory³, to the systemic vision of MemOS, to the engineering pragmatism of mem0, we see a clear trend: AI memory is evolving from temporary, external context into a systemic, internal, and actively managed first-class citizen.
For developers today, the point isn't to argue about which path is the one true way. It's to recognize that this evolution is happening. Start by improving your RAG pipeline. Introduce a multi-level cache. Encapsulate your memory in objects. Every small engineering improvement is a step toward building the more powerful, robust, and intelligent AI systems of the future. The road is long, but it's the one we're on.
Subscribe to my newsletter
Read articles from Qin Liu directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Qin Liu
Qin Liu
I am Qin Liu, a software engineer with a deep passion for computer systems and artificial intelligence. Currently, I work at MyScale, a fully-managed vector database company, focusing on vector databases and retrieval-augmented generation (RAG).