We gave an agent 50 memories and let them decay. After 30 ticks, it retained 8 — exactly the ones it was still using.
We stored 50 facts in an agent’s memory, each with a half-life of 10 ticks. We ran 30 ticks of a task loop where the agent only recalled 8 of them. At the end, those 8 were still at full confidence. The other 42 were gone — expired automatically, no cleanup code, no manual pruning.
The agent’s context stayed small, retrieval stayed fast, and nothing it forgot was relevant to what it was doing.
This is Memory Half-Life, a Python library built on EntropyOS. Every memory has a confidence score that decays exponentially. Use a memory and its clock resets. Ignore it and it fades. Drop below the threshold and it’s gone.
The problem with keeping everything
Agent memory systems almost always work the same way: store everything, keep it forever, deal with the mess later. The usual fix is a bigger context window, better retrieval, or periodic summarization.
We’ve seen three things go wrong with that approach:
Retrieval degrades. More memories means more candidates, more false positives, and more stale context bleeding into decisions. An agent that remembers everything retrieves worse than one that remembers what’s relevant.
Cost grows linearly. Every stored memory costs storage, retrieval compute, and — if it ends up in a prompt — tokens. Memory that isn’t useful anymore is pure overhead.
Stale facts cause silent errors. An agent that remembers a user’s old address, a deprecated API endpoint, or yesterday’s price as current will act on outdated information. There’s no flag that says “this used to be true.” It just sits there, indistinguishable from current knowledge, until it causes a problem.
All three of these treat memory as something to manage. None of them treat it as something that should expire.
The decay model
The math is simple:
confidence = 2^(-elapsed / half_life)
A memory with a half-life of 10 ticks starts at 100% confidence. After 10 ticks, 50%. After 20, 25%. When it drops below the threshold (default 10%), EntropyOS expires it automatically.
Ticks aren’t wall-clock time. One tick is one conversation turn, one agent step, one loop iteration — whatever discrete unit makes sense for your system. This keeps the math deterministic and testable.
The key behavior: recalling a memory resets its decay clock. Memories the agent actively uses stay alive. Memories it doesn’t touch fade and die. Relevance is measured by use, not by a heuristic.
The API
Six methods. That’s the whole surface:
from memory_half_life import MemoryEngine
engine = MemoryEngine()
# Store a memory that decays
engine.store("user_preference", "dark mode", half_life=10)
# Recall it — resets the decay clock
engine.recall("user_preference") # → "dark mode", confidence refreshed
# Peek without refreshing — observation only
engine.peek("user_preference") # → "dark mode", clock unchanged
# Advance time
engine.tick(5)
# Check what's fading
engine.fading() # → memories below warning threshold
# Explicit delete
engine.forget("user_preference")
recall vs peek is the important distinction. recall says “I’m using this, keep it alive.” peek says “I’m looking at this but that doesn’t mean it matters.” This gives the system control over what counts as reinforcement.
Running it
Here’s what a 30-tick run looks like with 50 stored memories, 8 of which get recalled during the loop:
Tick 0: 50 memories, avg confidence 1.00
Tick 5: 50 memories, avg confidence 0.71
Tick 10: 50 memories, avg confidence 0.50, 8 refreshed to 1.00
Tick 15: 42 fading, avg confidence 0.25 | 8 active, avg confidence 0.71
Tick 20: 34 remaining (16 expired), 8 active at 0.50+
Tick 25: 14 remaining (36 expired), 8 active at 1.00
Tick 30: 8 remaining (42 expired), all active, avg confidence 0.71
The 42 unused memories didn’t get garbage collected. They didn’t get summarized. They decayed below threshold and EntropyOS removed them. The 8 that the agent kept recalling never dropped below working confidence.
How EntropyOS fits in
Memory Half-Life doesn’t implement its own expiry logic. It delegates to EntropyOS’s TTLStore — the same time-to-live key-value store described in Software That Expires.
The memory engine calculates a TTL from the half-life and threshold, hands it to EntropyOS, and lets the runtime handle expiry. When the agent calls recall, the engine touches the key in the TTL store, resetting the clock. When the engine calls tick, EntropyOS evaluates all stored state and prunes what’s expired.
This separation is deliberate. EntropyOS handles the mechanics of time-aware state — tick evaluation, deterministic expiry, serialization. Memory Half-Life handles the domain logic. Neither reimplements the other’s job.
The hard parts
Choosing half-lives is not obvious. Set them too short and the agent forgets useful context mid-task. Set them too long and you’re back to unbounded accumulation. We don’t have a universal formula for this. Start with half-lives that roughly match your expected task duration and adjust based on what the agent is dropping too early.
Implicit vs explicit reinforcement. Right now, the agent has to call recall for a memory to stay alive. If the agent uses information from a memory without explicitly recalling it — say, through a summarized context window — the memory decays even though it’s still contributing. This is a real limitation. A retrieval-aware decay model that reinforces memories when they appear in retrieved context would be more accurate, but also more coupled to the retrieval layer. We went with the simpler model first.
Small half-lives amplify noise. With a half-life of 3 ticks, a memory that the agent skips for two turns is already at 40% confidence. In agents with variable step timing — some turns take seconds, others take minutes — this can cause useful memories to fade during a long operation. Ticks are the right abstraction for deterministic testing, but mapping them to real agent behavior requires some calibration.
Three things to check if you try it
Baseline your agent’s recall patterns first. Before picking half-lives, instrument which memories your agent actually uses over a representative run. The distribution is usually lopsided — a small set gets recalled constantly and the rest are touched once or never. Set half-lives based on the observed gap between “active” and “stale” memories.
Use
fading()as a monitoring signal. Memories approaching threshold are worth watching. If the agent is routinely letting important context decay, the half-life is too short or the agent’s recall logic needs work. Surface the fading list in your agent’s debug output.Use
peekfor auditing,recallfor use. If you’re building a dashboard or debug tool that reads agent memory, usepeek— otherwise your monitoring tool will keep memories alive artificially.recallshould only happen in the agent’s actual decision path.
Memory Half-Life is open source at stack-research/memory-half-life. It runs on top of any EntropyOS runtime. The code is straightforward — the interesting part is watching a system that forgets on purpose and noticing how little of what it drops was ever going to be useful again.