What's the difference between context window and memory?

The context window is what the model sees right now. Memory is what the agent retrieves and injects into context across sessions. Memory persists; context resets per call.

Should I use Mem0 or build my own?

Mem0 / Zep give you a working memory layer in an afternoon. Roll your own when you need specific retrieval logic, want to own the data, or have integration constraints. Most teams should start managed and migrate later if needed.

What kinds of memory do agents need?

Episodic (what happened in past conversations), semantic (facts about the user/world), and procedural (how to do things). Most production agents primarily need episodic + semantic; procedural usually lives in tools/skills.

Giving AI Agents Memory in 2026 — Mem0, Zep, and the Patterns That Work

A conversation that resets every session feels mechanical. An agent that remembers a user’s preferences, past decisions, and patterns feels like an assistant. The difference is memory — and 2026 has decent tooling for it.

What memory is (and isn’t)

The context window is what the model sees in a single API call. Memory is what the agent fetches into context across sessions.

Three memory types from cognitive science map well:

Episodic — what happened (“the user asked about X yesterday”).
Semantic — facts (“the user works in Mumbai, prefers brevity”).
Procedural — how to do things (“steps to triage a billing dispute”).

Production agents need episodic + semantic. Procedural usually lives as Skills or tools.

The 2026 tooling

	Strengths	Best for
Mem0	Easy SDK, multi-provider, semantic + episodic	Fast adoption
Zep	Graph-based, temporal, strong observability	Complex agents, long histories
LangGraph + Postgres checkpointer	Inside LangGraph, tight integration	LangGraph users
Custom (pgvector + summary)	Full control	Special needs

For most teams: Mem0 first, migrate to custom when you outgrow it.

Mem0 in 5 lines

from mem0 import Memory

mem = Memory()

# Add memories from a conversation
mem.add(messages=conversation_history, user_id="user-42")

# Retrieve relevant memories at the start of a new conversation
relevant = mem.search(query=current_user_question, user_id="user-42", limit=5)

mem.add extracts memorable facts from the conversation and stores them. mem.search retrieves relevant ones. Inject the retrieved memories into the system prompt; the agent now has context across sessions.

What gets stored

The naive approach — store every message — explodes context fast. Better:

1. Salient fact extraction

After each conversation, an LLM extracts memorable facts:

User prefers brief responses. Working on Project Alpha. Mentioned a deadline of May 15. Lives in Mumbai.

These get stored as memories with metadata (when, source conversation).

2. Conversation summaries

A compact summary of the conversation is stored. Useful for “remind me what we discussed last time.”

3. Decisions and outcomes

When the agent took action (“scheduled meeting for Tuesday”), record it.

Store these in a vector index for later retrieval. Mem0 / Zep handle the extraction and storage.

What NOT to store

Verbatim chat logs (use summaries).
PII without consent (compliance trap).
Anything you can’t justify retaining.

For privacy/compliance considerations see LLM Security in 2026 .

Retrieval

At conversation start, fetch relevant memories:

async def start_conversation(user_id: str, opening_message: str) -> list[dict]:
    memories = await mem.search(query=opening_message, user_id=user_id, limit=5)
    return [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "system", "content": f"Relevant memories about user:\n" +
            "\n".join(f"- {m.text}" for m in memories)},
        {"role": "user", "content": opening_message},
    ]

The agent sees the relevant memories as context. Same retrieval pattern as RAG, just over user-specific memories instead of corpus chunks. See Build a RAG App with pgvector .

Custom memory with pgvector

If managed feels limiting:

CREATE TABLE memories (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    text TEXT NOT NULL,
    embedding vector(1536) NOT NULL,
    source_conversation_id TEXT,
    confidence REAL DEFAULT 1.0,
    created_at TIMESTAMPTZ DEFAULT now(),
    last_used_at TIMESTAMPTZ
);

CREATE INDEX memories_embedding ON memories
  USING hnsw (embedding vector_cosine_ops);
CREATE INDEX memories_user ON memories(user_id);

async def add_memory(user_id: int, text: str):
    emb = await embed(text)
    await db.execute(
        "INSERT INTO memories (user_id, text, embedding) VALUES ($1, $2, $3)",
        user_id, text, emb,
    )

async def search_memories(user_id: int, query: str, k: int = 5):
    q_emb = await embed(query)
    return await db.fetch(
        """
        SELECT id, text, 1 - (embedding <=> $1) AS score
        FROM memories WHERE user_id = $2
        ORDER BY embedding <=> $1 LIMIT $3
        """,
        q_emb, user_id, k,
    )

Per-user scoping via user_id filter is mandatory for multi-tenant. Pair with pgvector Deep Dive for index tuning.

Memory hygiene

Memory accumulates. Without pruning, an agent stores conflicting facts (“the user is in Bangalore” + “the user is in Mumbai”) with no way to reconcile.

Strategies:

Recency weight: prefer recent memories.
Update on conflict: if a new fact contradicts an old one, mark the old superseded.
Confidence scores: store extractor confidence; deprecate low-confidence over time.
Decay / forgetting: a memory not retrieved in 6 months is archived.

Mem0 and Zep do most of this. If you roll your own, it’s real work.

Multi-agent memory

Multiple agents serving the same user need shared memory. Architecture:

User → Agent A ──┐
              \  │
               ▼ ▼
            Memory store
               ▲ ▲
              /  │
User → Agent B ──┘

All agents read from and write to the same per-user memory. Each tags its writes with agent_id for provenance.

Risk: one agent’s bad memory pollutes another. Mitigation: confidence thresholds, manual review for sensitive memories.

Cost

Memory is mostly cheap:

A few KB per memory.
A few memories per conversation.
Embedding cost: tiny ($0.02/1M tokens).
Storage: pennies per user per year.

The expensive part is the retrieval call before each conversation — but that’s one extra embedding + one quick vector query. Negligible.

Common mistakes

1. Storing too much

Verbatim chat logs grow unbounded. Always summarize before storing.

2. No per-user scoping

A retrieval that returns another user’s memories is a privacy breach. Always WHERE user_id = $1.

3. No conflict resolution

The agent reads “user wants A” + “user wants B” and gets confused. Resolve in the extractor or at retrieval.

4. Treating memory as permanent

Users change. Old memories go stale. Build in expiry / decay.

5. Not telling the user

Surfacing “I remember we discussed X” is good. Hidden memory of personal data feels creepy. Be transparent.

What I’d build today

For a new agent product:

Mem0 integrated at the conversation boundary.
Salient fact extraction after each session.
Per-user retrieval at conversation start.
A user-facing “what does the agent remember about me?” view.
Manual delete capability (compliance + UX).

Two days of work; a dramatically more useful agent.

Read this next

If you want a Mem0 + LangGraph + Postgres agent template, it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

What memory is (and isn’t)#

The 2026 tooling#

Mem0 in 5 lines#

What gets stored#

1. Salient fact extraction#

2. Conversation summaries#

3. Decisions and outcomes#

What NOT to store#

Retrieval#

Custom memory with pgvector#

Memory hygiene#

Multi-agent memory#

Cost#

Common mistakes#

1. Storing too much#

2. No per-user scoping#

3. No conflict resolution#

4. Treating memory as permanent#

5. Not telling the user#

What I’d build today#

Read this next#