A conversation that resets every session feels mechanical. An agent that remembers a user’s preferences, past decisions, and patterns feels like an assistant. The difference is memory — and 2026 has decent tooling for it.
What memory is (and isn’t)
The context window is what the model sees in a single API call. Memory is what the agent fetches into context across sessions.
Three memory types from cognitive science map well:
- Episodic — what happened (“the user asked about X yesterday”).
- Semantic — facts (“the user works in Mumbai, prefers brevity”).
- Procedural — how to do things (“steps to triage a billing dispute”).
Production agents need episodic + semantic. Procedural usually lives as Skills or tools.
The 2026 tooling
| Strengths | Best for | |
|---|---|---|
| Mem0 | Easy SDK, multi-provider, semantic + episodic | Fast adoption |
| Zep | Graph-based, temporal, strong observability | Complex agents, long histories |
| LangGraph + Postgres checkpointer | Inside LangGraph, tight integration | LangGraph users |
| Custom (pgvector + summary) | Full control | Special needs |
For most teams: Mem0 first, migrate to custom when you outgrow it.
Mem0 in 5 lines
from mem0 import Memory
mem = Memory()
# Add memories from a conversation
mem.add(messages=conversation_history, user_id="user-42")
# Retrieve relevant memories at the start of a new conversation
relevant = mem.search(query=current_user_question, user_id="user-42", limit=5)
mem.add extracts memorable facts from the conversation and stores them. mem.search retrieves relevant ones. Inject the retrieved memories into the system prompt; the agent now has context across sessions.
What gets stored
The naive approach — store every message — explodes context fast. Better:
1. Salient fact extraction
After each conversation, an LLM extracts memorable facts:
User prefers brief responses. Working on Project Alpha. Mentioned a deadline of May 15. Lives in Mumbai.
These get stored as memories with metadata (when, source conversation).
2. Conversation summaries
A compact summary of the conversation is stored. Useful for “remind me what we discussed last time.”
3. Decisions and outcomes
When the agent took action (“scheduled meeting for Tuesday”), record it.
Store these in a vector index for later retrieval. Mem0 / Zep handle the extraction and storage.
What NOT to store
- Verbatim chat logs (use summaries).
- PII without consent (compliance trap).
- Anything you can’t justify retaining.
For privacy/compliance considerations see LLM Security in 2026 .
Retrieval
At conversation start, fetch relevant memories:
async def start_conversation(user_id: str, opening_message: str) -> list[dict]:
memories = await mem.search(query=opening_message, user_id=user_id, limit=5)
return [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "system", "content": f"Relevant memories about user:\n" +
"\n".join(f"- {m.text}" for m in memories)},
{"role": "user", "content": opening_message},
]
The agent sees the relevant memories as context. Same retrieval pattern as RAG, just over user-specific memories instead of corpus chunks. See Build a RAG App with pgvector .
Custom memory with pgvector
If managed feels limiting:
CREATE TABLE memories (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
text TEXT NOT NULL,
embedding vector(1536) NOT NULL,
source_conversation_id TEXT,
confidence REAL DEFAULT 1.0,
created_at TIMESTAMPTZ DEFAULT now(),
last_used_at TIMESTAMPTZ
);
CREATE INDEX memories_embedding ON memories
USING hnsw (embedding vector_cosine_ops);
CREATE INDEX memories_user ON memories(user_id);
async def add_memory(user_id: int, text: str):
emb = await embed(text)
await db.execute(
"INSERT INTO memories (user_id, text, embedding) VALUES ($1, $2, $3)",
user_id, text, emb,
)
async def search_memories(user_id: int, query: str, k: int = 5):
q_emb = await embed(query)
return await db.fetch(
"""
SELECT id, text, 1 - (embedding <=> $1) AS score
FROM memories WHERE user_id = $2
ORDER BY embedding <=> $1 LIMIT $3
""",
q_emb, user_id, k,
)
Per-user scoping via user_id filter is mandatory for multi-tenant. Pair with pgvector Deep Dive
for index tuning.
Memory hygiene
Memory accumulates. Without pruning, an agent stores conflicting facts (“the user is in Bangalore” + “the user is in Mumbai”) with no way to reconcile.
Strategies:
- Recency weight: prefer recent memories.
- Update on conflict: if a new fact contradicts an old one, mark the old superseded.
- Confidence scores: store extractor confidence; deprecate low-confidence over time.
- Decay / forgetting: a memory not retrieved in 6 months is archived.
Mem0 and Zep do most of this. If you roll your own, it’s real work.
Multi-agent memory
Multiple agents serving the same user need shared memory. Architecture:
User → Agent A ──┐
\ │
▼ ▼
Memory store
▲ ▲
/ │
User → Agent B ──┘
All agents read from and write to the same per-user memory. Each tags its writes with agent_id for provenance.
Risk: one agent’s bad memory pollutes another. Mitigation: confidence thresholds, manual review for sensitive memories.
Cost
Memory is mostly cheap:
- A few KB per memory.
- A few memories per conversation.
- Embedding cost: tiny ($0.02/1M tokens).
- Storage: pennies per user per year.
The expensive part is the retrieval call before each conversation — but that’s one extra embedding + one quick vector query. Negligible.
Common mistakes
1. Storing too much
Verbatim chat logs grow unbounded. Always summarize before storing.
2. No per-user scoping
A retrieval that returns another user’s memories is a privacy breach. Always WHERE user_id = $1.
3. No conflict resolution
The agent reads “user wants A” + “user wants B” and gets confused. Resolve in the extractor or at retrieval.
4. Treating memory as permanent
Users change. Old memories go stale. Build in expiry / decay.
5. Not telling the user
Surfacing “I remember we discussed X” is good. Hidden memory of personal data feels creepy. Be transparent.
What I’d build today
For a new agent product:
- Mem0 integrated at the conversation boundary.
- Salient fact extraction after each session.
- Per-user retrieval at conversation start.
- A user-facing “what does the agent remember about me?” view.
- Manual delete capability (compliance + UX).
Two days of work; a dramatically more useful agent.
Read this next
- Build a Production RAG App with pgvector
- AI Agents with LangGraph
- Embedding Models in 2026
- LLM Security in 2026
If you want a Mem0 + LangGraph + Postgres agent template, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .