<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>LLM on Stack Research</title><link>https://stackresearch.org/tags/llm/</link><description>Recent content in LLM on Stack Research</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 02 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://stackresearch.org/tags/llm/index.xml" rel="self" type="application/rss+xml"/><item><title>Structural Debugging for Chain-of-Thought Graphs</title><link>https://stackresearch.org/research/trace-topology/</link><pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate><guid>https://stackresearch.org/research/trace-topology/</guid><description>&lt;p&gt;When a program crashes, the stack trace does not explain the whole bug. It does something narrower and more useful: it shows where execution was, what called what, and which line broke.&lt;/p&gt;
&lt;p&gt;When a language model&amp;rsquo;s reasoning goes wrong, the failure is usually harder to locate. The final answer may be fluent and wrong. The intermediate trace may drift quietly for a thousand tokens. There is often no structural map of what depended on what, and no obvious place to point and say: this is where the reasoning stopped holding together.&lt;/p&gt;</description></item><item><title>Executable Metaphors: Compiling Analogy Into Prototype Code</title><link>https://stackresearch.org/research/executable-metaphors/</link><pubDate>Tue, 17 Mar 2026 00:00:00 +0000</pubDate><guid>https://stackresearch.org/research/executable-metaphors/</guid><description>&lt;p&gt;Metaphors already shape software.&lt;/p&gt;
&lt;p&gt;A pipeline moves data from one stage to another. Garbage collection reclaims unused memory. A queue holds work until something is ready to process it. These words are not decorative. They carry a small model of how a system should behave.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/stack-research/executable-metaphors"&gt;Executable Metaphors&lt;/a&gt; asks what happens if that model becomes the input to a compiler. A short analogy, written in Markdown, is treated as the source artifact. The generated code, build files, documentation, and repair scripts are outputs.&lt;/p&gt;</description></item><item><title>The Unaskable Question</title><link>https://stackresearch.org/research/the-unaskable-question-machine/</link><pubDate>Mon, 16 Mar 2026 00:00:00 +0000</pubDate><guid>https://stackresearch.org/research/the-unaskable-question-machine/</guid><description>&lt;p&gt;Ask a language model something it does not know, and it may admit uncertainty or invent an answer. Ask it something a policy forbids, and it may refuse. Those are familiar failure modes. They have names, benchmarks, mitigations, and whole taxonomies around them.&lt;/p&gt;
&lt;p&gt;There is another category that receives less attention: questions the model cannot engage with because the question contradicts the structure of the system being asked. Not a knowledge gap. Not a safety boundary. A structural impossibility.&lt;/p&gt;</description></item><item><title>Evolving Better Prompts</title><link>https://stackresearch.org/research/genetic-prompt-programming/</link><pubDate>Sun, 15 Mar 2026 00:00:00 +0000</pubDate><guid>https://stackresearch.org/research/genetic-prompt-programming/</guid><description>&lt;p&gt;A four-generation prompt evolution run moved average fitness from 0.887 to 0.926. The best prompt reached 0.965. The run used a population of 8 prompts and completed in under 4 minutes on a MacBook Pro with &lt;code&gt;llama3.1:8b&lt;/code&gt; running locally through Ollama.&lt;/p&gt;
&lt;p&gt;The useful trick is not genetic programming in the old sense of random token edits. Mutation and crossover are language-model calls. Every variant is still a valid prompt. The model rewrites prompts in ways a human prompt engineer might recognize: tighter wording, added constraints, reordered instructions, more concrete examples, removed weak parts.&lt;/p&gt;</description></item></channel></rss>