Agents that “remember” are appealing in demos and tricky in production. Done right, your agent picks up where it left off, learns user preferences, and refines over time. Done wrong, you have stale facts contaminating answers, ballooning storage, and zero observability into what the agent “knows.” This post is the working playbook.

Three layers

Working memory: what’s in the current prompt. Cleared between turns; cheap; immediate.

Episodic memory: a log of past events, decisions, observations. Append-only; replayable.

Semantic memory: distilled facts. “User prefers JavaScript.” “Project deadline is Friday.” Curated; updated.

Each at a different lifecycle and access pattern.

Working memory

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    *recent_conversation,            # last N turns
    {"role": "user", "content": current_input},
]

For most agents: this is enough.

When the conversation grows past the context window:

if total_tokens(messages) > 100_000:
    summary = await summarize(old_messages)
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "system", "content": f"Earlier in this conversation: {summary}"},
        *recent_messages,
    ]

Summarize and drop. Sliding window.

Episodic memory

Append-only store of “what happened”:

CREATE TABLE agent_episodes (
    id          uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    agent_id    text NOT NULL,
    user_id     text,
    occurred_at timestamptz DEFAULT now(),
    kind        text NOT NULL,   -- 'observation', 'decision', 'action', 'feedback'
    content     jsonb NOT NULL
);
CREATE INDEX ON agent_episodes (agent_id, user_id, occurred_at DESC);
async def remember_episode(agent_id, user_id, kind, content):
    await db.execute(
        "INSERT INTO agent_episodes (agent_id, user_id, kind, content) "
        "VALUES ($1, $2, $3, $4)",
        agent_id, user_id, kind, content
    )

Useful for:

  • “What did we decide last time?”
  • Debugging agent decisions.
  • Rebuilding state.

Semantic memory

Distilled facts. Stored in a vector DB for similarity retrieval, plus structured for direct lookup.

CREATE TABLE agent_facts (
    id         uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id    text NOT NULL,
    fact       text NOT NULL,
    embedding  vector(1024),
    confidence float NOT NULL DEFAULT 1.0,
    sources    text[],          -- episode IDs that produced this
    created_at timestamptz DEFAULT now(),
    last_seen  timestamptz DEFAULT now()
);

When the agent observes a stable fact (“user prefers Python”), write/update.

async def remember_fact(user_id, fact, source_episode_id):
    embedding = await embed(fact)
    
    # Find similar existing fact
    similar = await db.fetchrow(
        "SELECT id, fact FROM agent_facts WHERE user_id = $1 "
        "ORDER BY embedding <=> $2 LIMIT 1", user_id, embedding
    )
    if similar and cosine(similar.embedding, embedding) > 0.9:
        # Update, don't duplicate
        await db.execute(
            "UPDATE agent_facts SET last_seen = now(), confidence = confidence + 0.1 "
            "WHERE id = $1", similar.id
        )
    else:
        await db.execute(
            "INSERT INTO agent_facts (user_id, fact, embedding, sources) "
            "VALUES ($1, $2, $3, $4)",
            user_id, fact, embedding, [source_episode_id]
        )

Dedupe via similarity; reinforce via repetition. See Embeddings .

Retrieval at runtime

async def relevant_memories(user_id, query):
    embedding = await embed(query)
    facts = await db.fetch(
        "SELECT fact FROM agent_facts WHERE user_id = $1 "
        "ORDER BY embedding <=> $2 LIMIT 5", user_id, embedding
    )
    return [f["fact"] for f in facts]

async def respond(user_id, query):
    memories = await relevant_memories(user_id, query)
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "system", "content": f"You remember: {'; '.join(memories)}"},
        {"role": "user", "content": query},
    ]
    return await llm.complete(messages)

Top-k relevant facts injected into the prompt. Targeted, not exhaustive.

Memory writes — when?

Three patterns:

A. After every turn (eager)

async def post_turn(user_id, conversation):
    extracted = await extract_facts(conversation)
    for f in extracted:
        await remember_fact(user_id, f, episode_id)

LLM extracts facts after every interaction. Expensive; comprehensive.

B. Periodic consolidation (lazy)

Run a job nightly that consolidates the last day’s episodes into semantic facts.

Cheaper; possibly miss in-session learning.

C. Explicit save

User says “remember that I prefer X.” Agent calls a save_memory tool.

Cheapest; relies on user prompting.

For most apps: explicit save + periodic consolidation.

Memory expiration / decay

async def decay_memories():
    # Memories not reinforced in 90 days lose confidence
    await db.execute(
        "UPDATE agent_facts SET confidence = confidence * 0.5 "
        "WHERE last_seen < now() - interval '90 days'"
    )
    # Drop low-confidence facts
    await db.execute(
        "DELETE FROM agent_facts WHERE confidence < 0.1"
    )

Prevents stale facts from polluting forever. “User lives in NYC” — that was 2 years ago; maybe not anymore.

Conflict resolution

User says “I prefer Python.” Months later, “I prefer Rust now.” Both stored.

async def resolve_conflicts(user_id, new_fact):
    existing = await find_contradicting(user_id, new_fact)
    if existing:
        # Most-recent-wins; or ask the user
        await db.execute(
            "UPDATE agent_facts SET confidence = 0.1 WHERE id = $1", existing.id
        )
    await remember_fact(user_id, new_fact, episode_id)

Recency weighting; explicit invalidation.

Visibility

Build a UI showing what the agent “knows”:

  • “Memories about you: …”
  • “Edit / delete memories.”
  • “Forget everything.”

Users trust agents that show their memory. They don’t trust black boxes.

Privacy

Memories often contain PII. Apply:

  • Redaction before embedding.
  • Per-user isolation (never cross user boundaries in retrieval).
  • Right-to-delete (DELETE all memories for user).
  • Encryption at rest if memories are sensitive.

Tools

Strengths
Mem0Open-source memory framework
Letta (formerly MemGPT)Hierarchical memory, OS-style
ZepMemory + reasoning over chat
Custom (Postgres + pgvector)Full control

For most apps: a custom memory store with pgvector is sufficient and avoids vendor lock-in. See Embedding Databases .

Common mistakes

1. Writing every observation as a fact

Storage explodes; signal drowns. Be selective.

2. No deduplication

Same fact stored 50 times. Use similarity dedupe.

3. No expiration

Old facts contaminate. Decay over time.

4. Cross-user contamination

Memory retrieval not filtered by user → privacy disaster.

5. Over-relying on memory

Agent uses stale memory instead of fresh context. Memory should complement, not replace, current context.

Read this next

If you want my agent memory starter (Postgres + pgvector + retrieval), it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .