Agents that “remember” are appealing in demos and tricky in production. Done right, your agent picks up where it left off, learns user preferences, and refines over time. Done wrong, you have stale facts contaminating answers, ballooning storage, and zero observability into what the agent “knows.” This post is the working playbook.
Three layers
Working memory: what’s in the current prompt. Cleared between turns; cheap; immediate.
Episodic memory: a log of past events, decisions, observations. Append-only; replayable.
Semantic memory: distilled facts. “User prefers JavaScript.” “Project deadline is Friday.” Curated; updated.
Each at a different lifecycle and access pattern.
Working memory
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
*recent_conversation, # last N turns
{"role": "user", "content": current_input},
]
For most agents: this is enough.
When the conversation grows past the context window:
if total_tokens(messages) > 100_000:
summary = await summarize(old_messages)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "system", "content": f"Earlier in this conversation: {summary}"},
*recent_messages,
]
Summarize and drop. Sliding window.
Episodic memory
Append-only store of “what happened”:
CREATE TABLE agent_episodes (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
agent_id text NOT NULL,
user_id text,
occurred_at timestamptz DEFAULT now(),
kind text NOT NULL, -- 'observation', 'decision', 'action', 'feedback'
content jsonb NOT NULL
);
CREATE INDEX ON agent_episodes (agent_id, user_id, occurred_at DESC);
async def remember_episode(agent_id, user_id, kind, content):
await db.execute(
"INSERT INTO agent_episodes (agent_id, user_id, kind, content) "
"VALUES ($1, $2, $3, $4)",
agent_id, user_id, kind, content
)
Useful for:
- “What did we decide last time?”
- Debugging agent decisions.
- Rebuilding state.
Semantic memory
Distilled facts. Stored in a vector DB for similarity retrieval, plus structured for direct lookup.
CREATE TABLE agent_facts (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
user_id text NOT NULL,
fact text NOT NULL,
embedding vector(1024),
confidence float NOT NULL DEFAULT 1.0,
sources text[], -- episode IDs that produced this
created_at timestamptz DEFAULT now(),
last_seen timestamptz DEFAULT now()
);
When the agent observes a stable fact (“user prefers Python”), write/update.
async def remember_fact(user_id, fact, source_episode_id):
embedding = await embed(fact)
# Find similar existing fact
similar = await db.fetchrow(
"SELECT id, fact FROM agent_facts WHERE user_id = $1 "
"ORDER BY embedding <=> $2 LIMIT 1", user_id, embedding
)
if similar and cosine(similar.embedding, embedding) > 0.9:
# Update, don't duplicate
await db.execute(
"UPDATE agent_facts SET last_seen = now(), confidence = confidence + 0.1 "
"WHERE id = $1", similar.id
)
else:
await db.execute(
"INSERT INTO agent_facts (user_id, fact, embedding, sources) "
"VALUES ($1, $2, $3, $4)",
user_id, fact, embedding, [source_episode_id]
)
Dedupe via similarity; reinforce via repetition. See Embeddings .
Retrieval at runtime
async def relevant_memories(user_id, query):
embedding = await embed(query)
facts = await db.fetch(
"SELECT fact FROM agent_facts WHERE user_id = $1 "
"ORDER BY embedding <=> $2 LIMIT 5", user_id, embedding
)
return [f["fact"] for f in facts]
async def respond(user_id, query):
memories = await relevant_memories(user_id, query)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "system", "content": f"You remember: {'; '.join(memories)}"},
{"role": "user", "content": query},
]
return await llm.complete(messages)
Top-k relevant facts injected into the prompt. Targeted, not exhaustive.
Memory writes — when?
Three patterns:
A. After every turn (eager)
async def post_turn(user_id, conversation):
extracted = await extract_facts(conversation)
for f in extracted:
await remember_fact(user_id, f, episode_id)
LLM extracts facts after every interaction. Expensive; comprehensive.
B. Periodic consolidation (lazy)
Run a job nightly that consolidates the last day’s episodes into semantic facts.
Cheaper; possibly miss in-session learning.
C. Explicit save
User says “remember that I prefer X.” Agent calls a save_memory tool.
Cheapest; relies on user prompting.
For most apps: explicit save + periodic consolidation.
Memory expiration / decay
async def decay_memories():
# Memories not reinforced in 90 days lose confidence
await db.execute(
"UPDATE agent_facts SET confidence = confidence * 0.5 "
"WHERE last_seen < now() - interval '90 days'"
)
# Drop low-confidence facts
await db.execute(
"DELETE FROM agent_facts WHERE confidence < 0.1"
)
Prevents stale facts from polluting forever. “User lives in NYC” — that was 2 years ago; maybe not anymore.
Conflict resolution
User says “I prefer Python.” Months later, “I prefer Rust now.” Both stored.
async def resolve_conflicts(user_id, new_fact):
existing = await find_contradicting(user_id, new_fact)
if existing:
# Most-recent-wins; or ask the user
await db.execute(
"UPDATE agent_facts SET confidence = 0.1 WHERE id = $1", existing.id
)
await remember_fact(user_id, new_fact, episode_id)
Recency weighting; explicit invalidation.
Visibility
Build a UI showing what the agent “knows”:
- “Memories about you: …”
- “Edit / delete memories.”
- “Forget everything.”
Users trust agents that show their memory. They don’t trust black boxes.
Privacy
Memories often contain PII. Apply:
- Redaction before embedding.
- Per-user isolation (never cross user boundaries in retrieval).
- Right-to-delete (DELETE all memories for user).
- Encryption at rest if memories are sensitive.
Tools
| Strengths | |
|---|---|
| Mem0 | Open-source memory framework |
| Letta (formerly MemGPT) | Hierarchical memory, OS-style |
| Zep | Memory + reasoning over chat |
| Custom (Postgres + pgvector) | Full control |
For most apps: a custom memory store with pgvector is sufficient and avoids vendor lock-in. See Embedding Databases .
Common mistakes
1. Writing every observation as a fact
Storage explodes; signal drowns. Be selective.
2. No deduplication
Same fact stored 50 times. Use similarity dedupe.
3. No expiration
Old facts contaminate. Decay over time.
4. Cross-user contamination
Memory retrieval not filtered by user → privacy disaster.
5. Over-relying on memory
Agent uses stale memory instead of fresh context. Memory should complement, not replace, current context.
Read this next
- Designing Tools for AI Agents 2026
- LLM Agent Error Recovery 2026
- Embedding Databases 2026
- Embeddings & Semantic Search 2026
If you want my agent memory starter (Postgres + pgvector + retrieval), it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .