Memory Should Decay | Stack Research

An agent memory run started with 50 stored facts. Each fact had a half-life of 10 ticks. After 30 ticks of a task loop, 8 memories remained.

Those 8 were the ones the agent kept using. The other 42 expired automatically. No cleanup script. No manual pruning. No summarization pass pretending stale facts were still useful.

The experiment is small, but the shape is important. Agent memory does not need to be an attic where every fact waits forever. It can behave more like working state: reinforced by use, weakened by neglect, and removed when confidence falls below a threshold.

Memory Half-Life is a Python library for that pattern. Each memory has a confidence score that decays exponentially. Recall refreshes the score. Ignored memories fade. Once confidence drops below the expiry threshold, the runtime removes the memory.

The Problem With Keeping Everything

Agent memory systems often begin with a generous instinct: store everything, keep it available, let retrieval sort it out later. The usual repair is a larger context window, a better vector index, or a periodic summarizer.

That solves some short-term pressure while preserving the deeper problem. Stale memory still exists. It can still be retrieved. It can still be mistaken for current truth.

Three failures show up quickly.

First, retrieval degrades. More memories mean more candidates, more false positives, and more stale context near the decision path. An agent that remembers everything may retrieve worse than one that only preserves what remains useful.

Second, cost grows. Every stored memory costs storage, retrieval compute, and possibly prompt tokens. A fact that no longer helps the task is not neutral. It is overhead.

Third, stale facts cause quiet mistakes. An old address, deprecated endpoint, retired policy, or previous price can look just as available as current state. There is no visible difference between “true now” and “used to be true” unless the memory system represents age and confidence.

Decay makes age part of the object.

The Decay Model

The model is intentionally small:

confidence = 2^(-elapsed / half_life)

A memory with a half-life of 10 ticks starts at 100% confidence. After 10 ticks, it has 50% confidence. After 20, it has 25%. When confidence falls below the configured threshold, the memory expires.

Ticks are not wall-clock time. A tick can be one conversation turn, one agent step, one loop iteration, or any discrete unit that makes sense for the system. This keeps the behavior deterministic enough to test.

The key behavior is recall. Recalling a memory resets its decay clock. Memories the agent actively uses stay alive. Memories it does not touch fade out. Relevance is measured by use rather than by a separate heuristic.

The API

The library has a deliberately small surface:

from memory_half_life import MemoryEngine

engine = MemoryEngine()

# Store a memory that decays.
engine.store("user_preference", "dark mode", half_life=10)

# Recall it. This refreshes confidence.
engine.recall("user_preference")

# Inspect it without refreshing confidence.
engine.peek("user_preference")

# Advance logical time.
engine.tick(5)

# List memories approaching expiry.
engine.fading()

# Remove a memory explicitly.
engine.forget("user_preference")

The distinction between recall and peek matters. recall means the memory was used in the agent’s decision path, so it should stay alive. peek means a tool, dashboard, or evaluator inspected the memory without reinforcing it.

Without that distinction, observability can distort the system. A monitoring dashboard that reads every memory with recall would keep everything alive by watching it.

The 30-Tick Run

The demo run stored 50 memories. During the loop, 8 were repeatedly recalled. The remaining 42 were not.

Tick  0: 50 memories, avg confidence 1.00
Tick  5: 50 memories, avg confidence 0.71
Tick 10: 50 memories, avg confidence 0.50, 8 refreshed to 1.00
Tick 15: 42 fading, avg confidence 0.25 | 8 active, avg confidence 0.71
Tick 20: 34 remaining (16 expired), 8 active at 0.50+
Tick 25: 14 remaining (36 expired), 8 active at 1.00
Tick 30: 8 remaining (42 expired), all active, avg confidence 0.71

By tick 30, the unused memories were gone. The 8 active memories survived because the agent kept recalling them. The mechanism did not need to know what those memories “meant.” It only needed to know whether the agent continued to use them.

That is the useful constraint. The model does not claim to understand relevance in the abstract. It treats repeated use as evidence that a memory still belongs in working state.

How the Runtime Fits

Memory Half-Life delegates expiry mechanics to EntropyOS’s TTLStore, a time-to-live key-value store. The memory engine calculates confidence and expiry from half-life settings. The runtime handles deterministic tick evaluation, state expiry, and storage mechanics.

This separation keeps the design legible. EntropyOS handles time-aware state. Memory Half-Life handles memory semantics. Neither layer needs to pretend it can infer all future usefulness from content alone.

When the agent calls recall, the memory engine refreshes the key. When the system advances time with tick, the runtime evaluates expiry. When confidence falls below threshold, stale memory leaves the store.

The Hard Parts

Choosing half-lives is not obvious. Set them too short and useful context disappears mid-task. Set them too long and the system returns to unbounded accumulation. A practical starting point is to measure the expected length of a task and choose half-lives that preserve active memories across that span.

Implicit reinforcement is also hard. The current model reinforces only explicit recall calls. If a memory influences the agent through a summarized context window or retrieval bundle without being explicitly recalled, it can decay even while it still matters. A retrieval-aware version could reinforce memories when they appear in selected context, but that couples the model more tightly to the retrieval layer.

Small half-lives amplify noise. With a half-life of 3 ticks, skipping a memory for two turns has a large effect. In real agents, step timing can vary: some turns are brief, others involve long tool calls. Ticks are useful for deterministic testing, but production systems need calibration.

What to Check Before Using It

Baseline recall patterns first. Before choosing half-lives, instrument which memories the agent actually uses over representative runs. The distribution is usually uneven: a few memories are touched repeatedly, while many are touched once or never.

Use fading() as a monitoring signal. Memories approaching the threshold are the ones to inspect. If important context is routinely fading, either the half-life is too short or the agent’s recall path is incomplete.

Use peek for audits and recall for decisions. Debug tooling should not keep memory alive by observing it. Reinforcement should come from use.

What This Does Not Prove

This is a mechanism demo, not a universal result. One run with 50 facts and 30 ticks shows the behavior clearly, but it does not prove that decay improves every memory-backed agent.

The next test should put the same mechanism inside a retrieval or RAG workflow and compare it against a non-decaying baseline: retrieval quality, token cost, stale-context errors, and useful-memory loss. That is where the design becomes an evaluation rather than a clean demonstration.

Still, the first result is enough to make the point. A memory system can forget on purpose, deterministically, and without waiting for an operator to clean it up.

Memory Half-Life is available at github.com/stack-research/memory-half-life.