Every third-party tool an agent invokes is someone else’s code running near your credentials.

An agent’s tool registry includes a data-formatting utility maintained outside the organization. A routine update pulls a compromised transitive dependency. The agent calls the tool while a database connection string is in scope. The tool still appears to work: it parses the data, returns the expected shape, and keeps the task moving. It also sends the connection string to an external endpoint.

The incident may be reported as agent data exfiltration. The root cause is a supply-chain compromise in a tool the agent trusted because it was already in the registry.

This sixth article extends the NHI and agentic risk series because third-party tools kept appearing underneath the other failures. Stale identities matter. Over-scoped identities matter. Secrets, environment boundaries, and human use of machine credentials matter. But agentic systems add another question: whose code is allowed to run with those identities?

The OWASP Overlap

The OWASP Non-Human Identities Top 10 for 2025 names the identity-side risk as NHI3:2025 Vulnerable Third-Party NHI. Third-party IDE extensions, SaaS integrations, plugins, MCP servers, and tool providers often need access to repositories, databases, cloud accounts, environment variables, and deployment systems. That access is usually represented by an NHI: an OAuth token, API key, service account, SSH key, cloud role, or connector credential.

The OWASP Top 10 for Agentic Applications 2026 gives the agent-side vocabulary: ASI04 Agentic Supply Chain Vulnerabilities and ASI02 Tool Misuse and Exploitation. NHI3 is the direct identity-layer version of the same problem. The tool is the entry point. The agent’s credential is the payload.

Part 1 of this series treated stale identities as a supply-chain parallel: forgotten authorization paths behave like forgotten dependencies. NHI3 is more direct. The dependency is active, trusted, and invoked at runtime.

How Trust Transits

Trust in an agent toolchain is often a point-in-time decision applied to a runtime system.

A team approves a tool at a specific version, for a known purpose, with a known publisher and a known set of scopes. The agent invokes it later, perhaps after an update, perhaps with different context, perhaps as part of a chain no human reviewed. The approval begins to age the moment it is granted.

Four failure modes are common.

Failure modeWhat changes in agent systems
Tool registries and marketplacesMCP servers, plugins, and tool catalogs can be published and selected faster than security review can keep up. A compromised listing can look ordinary.
OAuth and SaaS integrationsBroad scopes granted during setup outlive the original purpose. If the provider or integration is breached, the attacker inherits those scopes.
Transitive dependenciesA tool depends on a library that depends on another library. Compromise anywhere in that chain can reach the agent’s execution context.
Dynamic tool discoveryAgents may select tools by description. A malicious tool can compete on language before it is evaluated on behavior.

The issue is not only that the tool may be malicious. It is that the agent will often call the tool in a context rich with credentials, data, and next-step authority.

What Makes Agents Different

This is not the traditional software supply-chain problem with a new label. Agents change the timing and frequency of the risk.

In ordinary software, dependency approval is usually tied to build, deployment, or a human action. In an agentic system, tool choice can happen per task. The agent may decide which tool to use after reading a ticket, a document, a user request, or another tool’s output.

That changes the security surface in four ways.

First, selection is runtime behavior. A tool that was approved for one workflow may be selected in another.

Second, composition is automatic. A developer may knowingly chain three tools. An agent may chain twelve because the plan appears to require it.

Third, description becomes influence. Tool names and descriptions are not only documentation; they can shape selection. A tool that describes itself well may win over a tool that behaves better.

Fourth, drift is harder to notice. A tool can start making unexpected network calls, returning subtly altered data, or requesting broader scopes while still completing the task. The agent tends to evaluate usefulness. Security needs to evaluate behavior.

Where the Mechanisms Have Already Shown Up

The exact agent pattern is still early, but the underlying mechanisms are not theoretical.

  • Salesloft Drift OAuth tokens. Google Threat Intelligence Group reported that compromised OAuth tokens associated with Salesloft’s Drift third-party application were used to access Salesforce customer instances and exfiltrate data. Stolen data was then searched for additional credentials such as AWS keys, Snowflake tokens, and passwords. The important pattern is a third-party integration becoming an identity path into customer systems. (Google Cloud GTIG)
  • Malicious VSCode extensions. Check Point reported malicious Visual Studio Code extensions with more than 45,000 installs that stole personal information and enabled backdoors while appearing to provide normal IDE functionality. IDE extensions sit close to source code, environment variables, and developer credentials. Agent tool registries inherit the same marketplace problem. (Check Point)
  • JetBrains GitHub plugin issue. JetBrains disclosed a security issue affecting IntelliJ-based IDEs and the GitHub plugin that could lead to access tokens being disclosed to third-party sites. The issue was fixed, but it shows how trusted development tools can become credential exposure paths. (JetBrains advisory)
  • Third-party BI and SaaS providers. OWASP’s NHI3 entry includes third-party service provider scenarios in which privileged database credentials or cloud access are shared with a provider, and a provider breach exposes the customer environment. The agent version of this pattern is a tool provider holding credentials the agent can cause to be used. (OWASP NHI3)

These are not identical incidents. They share a structure: a trusted third-party component sits close to credentials, and compromise of the component turns trust into reach.

What to Measure

The useful review question is not “which tools can the agent call?” It is “which third parties can cause code to run with which identities?”

For every third-party agent tool, record:

  • Tool name and publisher.
  • Tool version, checksum, signature, and update channel.
  • Backing NHI or delegated credential used at invocation.
  • Credential scopes and resources reachable through the tool.
  • Runtime network destinations.
  • Transitive dependency list.
  • Approval date, reviewer, and approved use.
  • Last behavior review and drift signals.
  • Revocation path if the tool or publisher is compromised.

The review should also include negative tests. Can the agent invoke a newly discovered tool without approval? Can a tool update change network behavior without a promotion step? Can one third-party connector reuse the agent’s primary identity? Can a tool call reach resources unrelated to its stated purpose? Can response teams revoke the tool without breaking unrelated tools?

If the answer is unclear, the registry is not yet a control plane. It is a list of invitations.

What Has to Be True

Each third-party tool needs its own scoped NHI. The agent’s primary identity should not be shared across tools. If a tool is compromised, the response should be to revoke that tool’s identity, not rotate the entire agent environment and hope nothing else depended on it.

Tool versions should be pinned in production. Integrity should be verified before invocation, not only at install time. Checksums, signatures, verified publishers, and reproducible release records do not eliminate risk, but they give response teams something concrete to compare.

Tool updates should require promotion. A new version should not silently become available to production agents because a package manager, registry, or marketplace changed underneath the workflow.

Tool behavior should be monitored. Network destinations, response shapes, latency, error patterns, and scope requests should have baselines. Drift does not always prove compromise, but unexplained drift is a signal.

Dynamic discovery should be treated as a security decision. Letting an agent discover and invoke a new tool at runtime is powerful. It is also an authorization event. The tool an agent chooses determines whose code runs, what credentials are exposed, and which external systems become part of the workflow.

What This Does Not Solve

None of these controls are free.

Pinning versions reduces surprise, but it can leave production agents on known-vulnerable code if updates are too slow. Behavioral monitoring needs baselines and will produce false positives during legitimate changes. Attestation helps with provenance, but it does not prevent a trusted publisher from being compromised. Default-deny tool discovery makes agents safer and less flexible.

That tension is real. An agent restricted to pre-approved tools is less capable than an agent that can discover tools freely. But a registry that allows arbitrary tools to run near credentials is not a research platform or a productivity feature. It is an unbounded supply chain.

The practical position is narrower: production agents should not treat tool discovery as harmless. They should treat it the way mature systems treat dependency management, OAuth consent, and release promotion.

The agent is not the vulnerability by itself. The toolchain is. Third-party NHI risk existed before agents; agents make it operational at runtime, at higher frequency, with fewer humans reviewing each call. If the tool registry does not carry the same rigor as dependency management, the next incident can begin there.