Earn the Right to Touch Production

Earn the Right to Touch Production

Stack Research
research security

An agent that can produce a good plan is not the same as an agent that should be allowed to run it.

After every agent incident, the postmortem asks the same three questions:

  • What changed?
  • Who changed it?
  • How fast can we undo it?

The answers are usually bad. Not because the model failed, but because the system gave execution rights to something that hadn’t earned them.

We hand an agent a prompt, an identity, and a toolchain, then act surprised when a coherent sentence becomes an irreversible production action. That’s not an AI error. That’s a governance error.

Authority Is Not a Toggle

The default is binary: the agent can act, or it can’t. That’s not enough.

Authority should be a ladder:

  1. Observe. Read state, detect drift, explain what it sees.
  2. Recommend. Propose a plan and expected impact. Don’t execute.
  3. Simulate. Run the plan against adversarial branches. Surface failure modes.
  4. Execute. Act inside bounded policies with tested rollback paths.

The avoidable incidents happen when someone jumps from step 1 to step 4 because a demo looked good.

Capability Is Not Eligibility

An agent can be impressive and still have no business touching production.

Capability: can it produce a useful plan? Eligibility: should it run that plan, in this environment, right now?

Different questions, different controls. If capability is model quality, eligibility is system quality.

What to Require Before Execution

Before any high-impact action, check five things. If one is missing, block it.

  • Scoped permission. The action maps to an explicit identity scope, not inherited convenience access.
  • Fresh state. The decision is based on a current, verified snapshot — not stale context.
  • Survival under pressure. The plan holds up across hostile counterfactual branches.
  • Tested rollback. A reversal path exists with known cost and kill switches.
  • Full lineage. Logs show the policy owner, executing identity, and approval chain.

No evidence, no execution. That one rule turns autonomy from “trust me” into something auditable.

Prompting Won’t Save You

Better prompts improve phrasing and consistency. They don’t create containment.

A perfect instruction can still run under the wrong credential. A careful chain-of-thought can still execute on stale state. A safe-looking tool call can still mutate downstream systems.

When execution rights are loose, better models just increase confidence in failure.

A One-Quarter Plan

This is engineering work, not a research project.

  1. Month 1 — Boundaries. Inventory tool actions, classify impact, enforce per-tool identity scopes. Deny by default for anything high-risk.
  2. Month 2 — Simulation. Add counterfactual branch tests for high-impact workflows. Track branch survival as a release gate.
  3. Month 3 — Reversibility. Ship rollback playbooks, hard stop policies, and identity lineage logs tied to every automated action.

Track two things: unsafe actions blocked before execution, and time to recovery when execution fails. If those numbers don’t move, your governance is theater.

The Point

Airplanes, payment rails, and industrial control systems all separate “can do” from “allowed to do.” Agent systems should work the same way.

The goal isn’t autonomy that acts because it can. It’s autonomy that acts because it proved it should.