Agentic systems do not only ingest prompts. They ingest files.
A reasoning trace arrives for debugging. A benchmark archive is downloaded for evaluation. A support export is added to a retrieval corpus. A set of examples is copied into a training library. Each object may look like ordinary text, but the object becomes active as soon as it is unpacked, parsed, rendered, indexed, transformed, or passed to another tool.
That makes artifact intake a security boundary.
Prompt guardrails remain necessary, but they do not cover the whole path. Many relevant failures happen before a model chooses a response. They occur in the filesystem, archive extractor, manifest parser, terminal view, notebook, corpus builder, or provenance record. Text-heavy artifacts are not inert once local tooling begins to execute assumptions about them.
The appropriate control is not exotic. It is controlled intake: land external material in a quarantined state, inspect it in a constrained workspace, record what was found, and require explicit approval before the material enters trusted workflows.
External Artifacts Cross Several Boundaries
Consider a normal debugging task. An outside reasoning trace arrives as a plain transcript, or as a compressed bundle containing JSON, logs, screenshots, metadata, and handwritten notes. The intent is benign: inspect the trace and find where the agent failed.
The artifact still crosses several boundaries.
It crosses the filesystem boundary when it is saved. It crosses the archive boundary when it is expanded. It crosses the parser boundary when JSON, XML, YAML, Markdown, CSV, or notebooks are loaded. It crosses the terminal boundary when filenames and content are printed. It crosses the provenance boundary when the material is mixed with trusted working state. It may later cross the training boundary if an example is retained as future context.
This pattern appears across agent operations:
- exported agent traces used for debugging
- external benchmark bundles used for evaluation
- support transcripts used for retrieval or calibration
- prompt libraries and example sets copied from other systems
- model outputs harvested for regression tests
- user-supplied files promoted into training or evaluation corpora
The security question is not only “does the text contain a malicious instruction.” It is also “what will local systems do with this object before anyone reads it.”
Why Prompt Controls Are Insufficient
Prompt injection is one class of artifact risk. It is not the only one.
An artifact bundle can be operationally unsafe without containing a clever instruction to a model. It can exploit path handling, expansion behavior, parser assumptions, duplicate names, terminal rendering, file-type confusion, or false provenance. It can be syntactically valid and still be unsuitable for trusted use because it is too large, too deep, too ambiguous, or too hard to attribute.
| Artifact class | Failure mode | Operational consequence |
|---|---|---|
Archives (.zip, .tar.gz) | Path traversal, symlink escape, nested archives, extreme fan-out | A nominal dataset writes outside the expected directory or expands into a much larger object than declared. |
| Text and logs | Control characters, terminal escape sequences, pathological line lengths | Operator views and downstream tools can be manipulated before semantic review. |
| JSON, XML, YAML, manifests | Malformed structure, duplicate keys, forged metadata, conflicting identifiers | A bundle appears valid while breaking assumptions used by importers and validators. |
| Mixed-format exports | Hidden binary payloads, misleading extensions, embedded active content | A mostly textual bundle is treated as uniformly safe. |
| Corpus candidates | Poisonous examples, silent duplication, missing source attribution | Untrusted material survives long enough to influence retrieval, evaluation, or training. |
Once a questionable artifact enters a trusted example library, the failure changes shape. The issue is no longer whether a file should be opened. The issue is why that material is now allowed to steer future system behavior.
A Small Trust-State Model
The trust-state model should be small enough to test and explain. Four states are usually enough for the intake boundary.
| State | Meaning | Allowed use |
|---|---|---|
quarantined | Raw external material has landed. | Hash it, retain it, record source claims, and prevent downstream consumption. |
inspected | The artifact has been unpacked or normalized in a constrained workspace. | Classify files, validate structure, detect risky patterns, and prepare a reviewable record. |
approved | The artifact passed configured checks and was explicitly promoted. | Permit constrained use in analysis, evaluation, retrieval, or corpus workflows. |
rejected | The artifact is unsafe, malformed, deceptive, unverifiable, or irrelevant under policy. | Block promotion; retain only if forensic or audit policy requires it. |
The important property is explicit promotion. Trusted workflows should not consume external material merely because a file exists in a convenient directory. The transition from inspected to approved should leave a record that a human or policy engine can review.
external artifact
-> quarantined
-> inspected
-> inspection record + provenance record
-> approved for constrained downstream use
-> or rejected
The record matters as much as the state. A yes-or-no verdict is not enough during incident response. Teams need to know what came in, where it came from, what was found, what policy was applied, who or what approved it, and which later workflows consumed it.
The wilderness Prototype
trace-topology created the first practical need for this boundary. The project analyzes reasoning traces as structured objects. Those traces may arrive from systems outside local control, often as transcripts, JSON artifacts, copied logs, mixed-file archives, or hand-edited metadata.
To analyze a trace, the system has to ingest it. Ingestion is the trust boundary.
The wilderness prototype was built as a small artifact-intake layer for that class of workflow. Its purpose is not to decide whether the claims inside a transcript are true. Its purpose is narrower: decide whether the artifact bundle is structurally safe, operationally legible, and provenance-aware enough to cross into trusted local workflows.
A useful terminal summary should be plain:
INSPECTION 2026-04-05T14:22:18Z status: inspected
input: trace-bundle.zip
findings:
- moderate nested_archive
- severe control_sequence in filename
- low provenance_gap
promotion: blocked
next step: review inspection record or reject artifact
This kind of output is deliberately unglamorous. It can be saved, diffed, attached to an issue, parsed by another tool, or used as evidence in a later review.
What Teams Should Measure
An artifact-intake boundary becomes useful when it produces evidence, not only policy language. For agentic systems, the minimum useful measurements are concrete.
| Measurement | Why it matters |
|---|---|
| External artifact inventory | Shows which outside materials entered the workflow. |
| Hash and source record coverage | Makes later attribution and duplicate detection possible. |
| Promotion rate | Shows how often untrusted material becomes trusted state. |
| Rejection reason distribution | Reveals recurring supplier, dataset, export, or tooling problems. |
| Downstream consumer links | Identifies which agents, evaluations, retrieval indexes, or corpora used approved material. |
| Time from intake to approval | Exposes pressure to bypass review when workflows move quickly. |
The downstream links are especially important. A transcript is not only a transcript if it later becomes a regression fixture, a retrieval source, a few-shot example, or a training-corpus candidate. The earlier boundary determines whether that later use was deliberate.
What This Does Not Solve
Artifact intake is not malware detonation. It is not a substitute for operating-system sandboxing, container isolation, network controls, or least-privilege execution. It does not verify the truth of claims inside an artifact. It does not prove that approved material is safe forever.
The control is more modest. It prevents silent movement from external object to trusted state. It makes structure, provenance, and promotion visible. It gives downstream systems a trust state they can enforce instead of asking every parser, notebook, evaluator, and agent loop to invent its own intake discipline.
That modesty is useful. Boundary controls become vague when they promise to solve every failure at once. This one should do a smaller job: inspect external artifacts before they shape local systems.
Practical Requirements
Teams handling external artifacts in agent workflows should be able to answer five questions.
- Where do outside artifacts enter the system: archives, transcripts, logs, manifests, notebooks, benchmark bundles, model outputs, support exports, and corpus candidates?
- Which state is each artifact in now:
quarantined,inspected,approved, orrejected? - What inspection record and provenance record explain that state?
- Which downstream workflows can consume approved material?
- What prevents unapproved material from entering retrieval, evaluation, training, or agent execution paths?
These are old perimeter questions applied to a newer substrate. The novelty is not that files can be dangerous. The novelty is that agent workflows often treat text-heavy files as if they were only evidence, when they are also input to executable chains of tools.
Agent systems inherit risk from the artifacts they admit. A serious intake boundary makes that inheritance visible before it becomes part of the system’s memory, evaluation set, or trusted working state.
