How to Give Your AI Coding Agent Persistent Memory Across Sessions

6 min read · For developers using Claude Code, Cursor, Codex, or Gemini who are tired of re-explaining context every session

The problem: agents forget everything when you close the tab

You spend twenty minutes explaining your codebase to your AI coding agent. You tell it about the monorepo structure, the deployment conventions, the one branch it must never force-push to. The session ends. You come back tomorrow and it has no memory of any of it.

You are not doing anything wrong. This is how context windows work. Every session starts with a blank slate. The agent has no continuity of experience — no record of past mistakes, no accumulated knowledge of your project, no recollection of the rules you established last week.

The frustration is real and widespread. Developers using Claude Code, Cursor, Codex, and Gemini all hit the same wall. The agents are capable — they just cannot remember.

The distinction that matters: A context window holds information for one session. Memory holds information across sessions. Most agents have the former. Almost none have the latter by default.

Why context windows are not memory

Context windows are large and getting larger. That solves a different problem. A big context window means the agent can reason over more information at once within a single session. It does not mean that information survives when the session ends.

Think of the difference this way: a context window is RAM — fast, capacious, gone when the power cuts. Memory is disk — slower to query, but persistent. You need both. Right now, AI coding agents only ship with RAM.

The consequences compound over time. An agent with no persistent memory will:

Stuffing facts into a CLAUDE.md file helps, but it is a manual workaround. You are the memory. You remember what to put in the file. You update it when things change. That is not a solution — it is delegation of a machine problem back to a human.

The hidden cost: Re-explaining context is not just annoying. Every token you spend re-establishing what the agent already knew is a token not spent on the actual task. And re-explained rules are still just prompt rules — the agent can reason around them.

Three types of agent memory

Cognitive science distinguishes several memory types. The same taxonomy maps cleanly onto what AI coding agents need. Here is how each type works and what it looks like in practice:

Memory Type What It Stores Concrete Example
Episodic Records of specific past events — what happened, when, and what the outcome was The agent tried to force-push to main. You gave thumbs-down. That event is stored with context, timestamp, and failure description.
Semantic Generalised knowledge extracted from episodes — rules, patterns, facts about the world From multiple thumbs-down events, the system derives: "force-pushing to main causes broken deploys in this repo." That becomes a prevention rule.
Procedural Encoded behaviours — gates that fire before actions without requiring the agent to reason about them A PreToolUse hook that checks every git push command against the prevention rule and blocks the dangerous pattern automatically.

Most "persistent memory" proposals for AI agents stop at episodic: they store a log of past conversations. That is useful, but insufficient. The signal gets diluted in a sea of raw events. What agents need is the full pipeline: episodes promote to semantic rules, semantic rules compile into procedural gates.

How ThumbGate implements persistent memory

ThumbGate is built around this three-tier memory architecture. Here is each layer in concrete terms.

Episodic layer: the feedback log

Every thumbs-up or thumbs-down you give an agent action is written to a structured feedback log. Each entry captures the tool call that was made, the context at the time, what worked or went wrong, and any tags you add. The log is append-only and survives across sessions.

# Thumbs-down: record a specific failure
node .claude/scripts/feedback/capture-feedback.js \
  --feedback=down \
  --context="deploying to production" \
  --what-went-wrong="agent ran db migration without backup" \
  --what-to-change="always checkpoint before schema changes" \
  --tags="database,migrations,safety"

You do not need to write this manually for every interaction. The MCP server captures tool calls automatically. Manual feedback is for adding nuance that the agent could not observe on its own.

Semantic layer: the lesson database

Raw feedback events are processed into a SQLite database with full-text search (FTS5). This is not a flat file — it is a queryable knowledge store. When a new session starts, the system retrieves lessons relevant to the current task by similarity, not by recency.

The FTS5 index means retrieval is fast even as the database grows. You are not loading the entire history into context. You are loading the lessons most likely to matter right now. That is the difference between a knowledge base and a memory dump.

Procedural layer: prevention rules and gates

Promoted lessons generate prevention rules in prevention-rules.md. Rules are not prompt instructions — they are checked by a PreToolUse hook that fires before every tool call. The agent cannot reason around a gate. The gate runs outside the agent's context.

The promotion pipeline: Thumbs-down event → feedback log entry → lesson promoted to SQLite → prevention rule generated → PreToolUse gate active for every future session, with no additional setup.

Thompson Sampling for memory-informed decisions

Not every prevention rule has the same confidence level. A rule derived from one thumbs-down event is weaker than a rule reinforced by a dozen. ThumbGate uses Thompson Sampling — a multi-armed bandit algorithm — to handle this uncertainty.

For each gate, the system maintains a Beta distribution over outcomes. As thumbs-up and thumbs-down feedback accumulates, the distribution tightens. A gate with high confidence becomes a hard block. A gate still gathering signal issues a warning and lets the agent reconsider.

This matters for memory because it means the system learns your preferences rather than requiring you to manually tune thresholds. You give feedback. The gate calibrates. The agent adapts.

Thompson Sampling also prevents over-blocking. If a pattern that was once dangerous stops being a problem — because the codebase changed, or you updated your workflow — thumbs-up feedback on future calls will widen the distribution back toward allowing. Memory is not one-way.

Setup: persistent memory in two minutes

ThumbGate ships as an MCP server. Any agent that speaks MCP — Claude Code, Cursor, Codex, Gemini, Amp, OpenCode — can connect to it. You initialize once and the memory layer is active for every subsequent session.

npx mcp-memory-gateway init

That command sets up:

After init, your agent starts each session with context assembled from relevant past lessons. It does not start blank. It starts informed.

What memory looks like on day one vs. day thirty

On day one, the database is empty. The agent behaves the same as it always has. You give feedback on its actions.

By day thirty, the database has accumulated dozens of lessons. The agent's context at session start includes the most relevant ones. Prevention rules have tightened around patterns that caused problems. Patterns that worked have been reinforced. The agent makes fewer mistakes — not because it was retrained, but because the gates learned from your feedback.

That is persistent memory in practice: not a bigger context window, not a longer system prompt, but a feedback loop that accumulates signal and converts it into durable enforcement.

Give your agent memory that survives restarts

One command. Works with Claude Code, Cursor, Codex, Gemini, Amp, and any MCP-compatible agent.

$ npx mcp-memory-gateway init