Making Agents Aware of Agentic Risk

Tue, 28 Apr 2026 00:00:00 +0000

A capable agent can fail in two very different ways.

The first is loud. It breaks a rule, calls the wrong tool, or says something obviously false. You can see it.

The second is quiet. It forms a plausible plan on bad assumptions, keeps moving, and leaves a trail of reasonable-looking steps that point to the wrong place. That one is harder. It looks like progress until the consequences arrive.

Evaluations on Stack Research

Making Agents Aware of Agentic Risk