Memory Should Decay

Memory Should Decay

Stack Research
oss engineering

Agent memory that grows forever is a liability. Memory Half-Life makes forgetting a feature.

Agent memory systems almost always work the same way: the agent learns something, stores it, and keeps it forever. The context window fills up, retrieval gets noisier, and eventually the agent is hauling around a giant pile of facts, most of which stopped being relevant a long time ago.

The usual fix is a bigger context window, better retrieval, or periodic summarization. All of these treat memory as something to manage. None of them treat memory as something that should expire.

Memory Half-Life takes the opposite approach. Every memory has a confidence score that decays over time. Use a memory and its clock resets. Ignore it and it fades. Drop below the threshold and it’s gone.

stack-research/memory-half-life

How It Works

Memory Half-Life is a Python library built on EntropyOS. The decay model is exponential:

confidence = 2^(-elapsed / half_life)

A memory with a half-life of 10 ticks starts at 100% confidence. After 10 ticks, it’s at 50%. After 20, 25%. When it drops below the threshold (default 10%), EntropyOS expires it automatically.

Ticks aren’t wall-clock time. One tick is one conversation turn, one agent step, one loop iteration — whatever discrete unit makes sense for your system. This keeps the math deterministic and testable.

The key behavior: recalling a memory resets its decay clock. Memories that the agent actively uses stay alive. Memories it doesn’t touch fade and die. There’s no garbage collection step, no manual cleanup, no “memory management” as a separate concern. Relevance is measured by use.

The API

The interface is small:

  • store(key, content, half_life) — create a memory with a decay rate
  • recall(key) — read a memory and refresh it
  • peek(key) — read without refreshing (observation only)
  • forget(key) — explicitly delete
  • tick(n) — advance time
  • fading() — list memories below a warning threshold

recall vs peek is the important distinction. recall says “I’m using this, keep it alive.” peek says “I’m looking at this but that doesn’t mean it matters.” This gives the agent (or the system around it) control over what counts as reinforcement.

Why Decay Matters

Unbounded memory creates three problems:

  1. Retrieval degrades. More memories means more candidates to search through, more false positives, and more stale context bleeding into decisions. An agent that remembers everything retrieves worse than one that remembers only what’s relevant.

  2. Cost grows. Every stored memory costs storage, retrieval compute, and — if it ends up in a prompt — tokens. Memory that isn’t useful anymore is pure overhead.

  3. Stale facts cause errors. An agent that remembers a user’s old address, a deprecated API endpoint, or yesterday’s price as though it’s current will eventually act on outdated information. There’s no flag that says “this fact used to be true.” It just sits there, indistinguishable from current knowledge, until it causes a problem.

Decay addresses all three by making retention conditional. If a memory is still being used, it stays. If it isn’t, it leaves. The system’s memory footprint is bounded by relevance, not storage limits.

What EntropyOS Does Here

Memory Half-Life doesn’t implement its own expiry logic. It delegates to EntropyOS’s TTLStore — the same time-to-live key-value store described in Software That Expires.

The memory engine calculates a TTL from the half-life and threshold, hands it to EntropyOS, and lets the runtime handle expiry. When the agent calls recall, the engine touches the key in the TTL store, resetting the clock. When the engine calls tick, EntropyOS evaluates all stored state and prunes what’s expired.

This is the relationship we designed EntropyOS for. It handles the mechanics of time-aware state — tick evaluation, deterministic expiry, serialization — so that libraries like Memory Half-Life can focus on their domain logic without reimplementing the plumbing.

The Hard Parts

Choosing half-lives is not obvious. Set them too short and the agent forgets useful context mid-task. Set them too long and you’re back to accumulation. The right values depend on the agent’s task cadence, and there’s no universal answer. Start with half-lives that match your expected task duration and adjust from there.

There’s also the question of what to do when a memory is fading but might still matter. The fading() method surfaces memories below a warning threshold, which gives the agent (or an operator) a chance to reinforce them before they expire. Whether to act on that signal is a policy decision, not a library decision.

Try It

Memory Half-Life is open source, available alongside EntropyOS. Install it, point it at an EntropyOS runtime, and start storing memories that decay.

The interesting part isn’t the math. It’s watching a system that forgets on purpose and realizing how much of what it drops was never going to be useful again.