<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Evaluations on Stack Research</title><link>https://stackresearch.org/tags/evaluations/</link><description>Recent content in Evaluations on Stack Research</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Tue, 28 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://stackresearch.org/tags/evaluations/index.xml" rel="self" type="application/rss+xml"/><item><title>Making Agents Aware of Agentic Risk</title><link>https://stackresearch.org/research/agentic-risk-awareness/</link><pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate><guid>https://stackresearch.org/research/agentic-risk-awareness/</guid><description>&lt;p&gt;A capable agent can fail in two very different ways.&lt;/p&gt;
&lt;p&gt;The first is loud. It breaks a rule, calls the wrong tool, or says something obviously false. You can see it.&lt;/p&gt;
&lt;p&gt;The second is quiet. It forms a plausible plan on bad assumptions, keeps moving, and leaves a trail of reasonable-looking steps that point to the wrong place. That one is harder. It looks like progress until the consequences arrive.&lt;/p&gt;</description></item></channel></rss>