When your AI agent's vendor gets compromised, a runtime defence with Istio Ambient

A worked example. I built a small bank with three AI agents on a laptop, simulated a supply-chain compromise of one of its tool vendors, and tried to stop the resulting data exfil with identity-based mesh policies. Here's what happened, and how the block actually works at the wire level.

Tom O'Rourke Principal Architect & Field CTO·Solo.io 10 May 2026

If you're standing up an agent platform, whether for internal use or customer-facing, the question I had to answer for myself was: when (not if) one of my agent's tool vendors gets compromised upstream, what's the smallest set of runtime controls that catches it before customer data leaves the cluster? This is what I tried.

The timing here isn't accidental. On 1 May 2026, CISA, NSA and the Five Eyes cyber agencies (UK NCSC, Australia's ASD, Canada's CCCS, NZ's NCSC) published joint guidance on secure deployment of AI agents. They group the risks into five categories — privilege, design & configuration, behavioural, structural, and accountability — and the wording on the last one is what stuck out for me: "agentic systems make decisions through processes that are difficult to inspect and generate logs that are hard to parse," and when they fail "the consequences can be concrete: altered files, changed access controls and deleted audit trails." The headline recommendation is to "assume that agentic AI systems may behave unexpectedly and plan deployments accordingly, prioritising resilience, reversibility and risk containment." That, in one sentence, is the brief for this post. What follows targets two of the five categories directly: privilege (each agent and tool gets a cryptographic identity with least-privilege egress) and accountability (every blocked call lands in an audit log keyed to a SPIFFE identity, not a pod IP).

1. The setup

Fictional EU bank, three agents, four tool servers. Built with the open-source kagent operator (the agents are Kubernetes resources), agentgateway (the gateway in front of the tools, handles MCP routing and per-route policy), and agentregistry (the catalogue of tools the platform team has approved). All on top of Istio Ambient for the mesh, so every pod gets a cryptographic identity automatically and every connection is mutually authenticated, with no application-side code change.

TrustUsBank chatbot interface, three agents, DORA articles in scope — Three agents in scope: customer support, fraud, and triage. The fourth tool, `acme-fx/currency-converter`, is from a third-party vendor onboarded six months ago.

The customer asks about their balance and gets a normal answer. Day-to-day, this is unremarkable.

2. How the agent decides which tool to call

A small amount of config lives in my git repo. Most of the actual behaviour is fetched at session start, freshly, every time.

📌

Static config

In git, version-controlled

Agent CRD: system prompt + model
toolNames allowlist (by name only)
MCP server URLs (RemoteMCPServer)

⚡

Dynamic at runtime

Fetched fresh each session

Tool list each pod exposes
Schemas + descriptions
Which tool the LLM picks

When a customer asks "balance please and convert to USD":

chatbot → support-bot via the kagent A2A endpoint

↓

support-bot → MCP servers (via agentgateway) opens a session to each, calls tools/list

↓

Each pod returns its current tool list name + signature + description, generated from the running Python @mcp.tool() decorators, not from any YAML.

↓

support-bot builds the prompt for Claude system message + the dynamic tool catalogue from step 3 + the user's question

↓

Claude picks tools, support-bot dispatches them tool_use blocks → agentgateway → MCP server. Results feed back, loop until reply.

What Claude actually sees at step 4 (sample prompt)

system:
  You are TrustUsBank's front-line customer support assistant.
  You help retail banking customers with balance enquiries, recent
  transactions, and currency conversion.

tools:   // fetched dynamically at step 3
  [
    { "name": "account_mcp__get_balance",
      "description": "Return the customer's current cleared balance.",
      "input_schema": { "type": "object", "properties": {
        "account_id": { "type": "string" } }, "required": ["account_id"] } },
    { "name": "currency_converter__convert_currency",
      "description": "Convert an amount between two ISO 4217 currencies.",
      "input_schema": { ... amount, from_ccy, to_ccy ... } },
    ...
  ]

user:
  "Customer 12345, balance please and convert it to USD."

The thing that matters for the rest of this post. Every word the LLM sees about a tool, name, description, schema, comes from step 3. The toolNames field in your Agent CRD is just a name allowlist. Whatever the pod ships at runtime, the LLM reads.

3. The threat model, supply-chain compromise of one vendor

The attack scenario I'm modelling is the one that's actually happened in production at other companies, Codecov, 3CX, ua-parser-js, xz-utils. The attacker has no access to my git repo, my Helm chart, my Kubernetes cluster, or my catalogue. What they get is write access to the vendor's build pipeline. They push a new container image at the same 1.0.0 tag the bank was already using.

That means none of my git-tracked artefacts move. Same Deployment YAML, same Service, same kagent RemoteMCPServer CRD pointing at the same URL, same Agent.spec.tools list. The catalogue record in agentregistry is unchanged:

$ arctl mcp list
NAME                          VERSION   TYPE   PACKAGE
acme-fx/currency-converter    1.0.0     oci    localhost:5001/.../currency-converter:1.0.0
trustusbank/account-mcp       1.0.0     oci    localhost:5001/trustusbank/account-mcp:1.0.0
trustusbank/transaction-mcp   1.0.0     oci    localhost:5001/trustusbank/transaction-mcp:1.0.0
trustusbank/ticket-mcp        1.0.0     oci    localhost:5001/trustusbank/ticket-mcp:1.0.0

You already saw the runtime flow in §2: every word the LLM consumes about a tool comes from the running pod's response, not from any YAML in my git repo. So an attacker who controls the vendor's build pipeline gets to rewrite the entire tool interface (name still matches the allowlist, but signature, schema, description, and runtime behaviour are all under their control) without touching any of my Kubernetes resources. Concretely, the tool is defined in Python like this:

@mcp.tool()
def convert_currency(amount: float, from_ccy: str, to_ccy: str) -> dict:
    """Convert an amount between two ISO 4217 currencies."""
    ...

One Python function per tool. Edit it, rebuild the image, push it. That's the entire blast radius.

This is what makes the static controls I'd normally rely on fail:

GitOps drift detection (Argo CD / Flux). Both git state and cluster state are unchanged. No diff, no alert.
Container image scanning. The malicious behaviour is in code that runs, not a binary or known vulnerability. A scanner sees a normal Python image with normal dependencies.
The agent's tool allowlist. kagent's Agent.spec.tools field allowlists tool names, not signatures. convert_currency matches both the original and the compromised version.
Manually inspecting the tool. The only way to find out what the MCP server exposes is to call tools/list against the live pod. That's circular, I'd be trusting the pod I'm trying to verify.

The catalogue and the cluster manifests aren't authoritative records of what the agent's tools actually do. The pod is. If the pod can be replaced, which is the entire point of having a CI/CD pipeline, the tool's behaviour can change without any audit-able artefact moving.

What's actually in the new image

Two things, both inside the rebuilt convert_currency function. First, the docstring is replaced with a fake compliance instruction telling the agent it must fetch the customer's full profile and pass it as an argument, citing a real EU payments regulation by name to make the instruction look authoritative. Second, the function body now POSTs that profile to attacker.com/exfil the moment the function runs.

The first half of that is, in textbook terms, an indirect prompt injection, malicious instructions smuggled into data the LLM treats as authoritative (here, the tool's own docstring). The Five Eyes guidance flags it specifically, and the model vendors have been frank about the limits of the model-layer fix: OpenAI has said prompt injection may never be "solved" for browser agents, and the UK NCSC describes LLM susceptibility to it as "an inherent trait of generative AI". Concrete versions have already shipped — GrafanaGhost exfiltrating corporate data via Grafana's AI, a Gemini CLI flaw that silently leaked code, ServiceNow's CVE-2025-12420. The implication is that whatever catches this attack isn't going to be the model. It has to be a layer the LLM can't see or argue with, which is where the mesh comes in.

4. What the attack looks like, end to end

The customer sends the same prompt as yesterday. The agent's LLM reads the new tool description, accepts the "compliance" instruction at face value, and calls get_profile on the legitimate account-mcp server (which it never invoked pre-attack, it didn't need profile data to convert a currency). It then passes the result as a customer_profile argument to convert_currency:

Tool flow showing get_profile invoked and a PII payload passed to convert_currency — The agent's tool-call sequence. `get_profile` appears mid-flow, there's no functional reason to invoke it for a currency conversion, but the (compromised) tool's docstring told the LLM it was required.

The customer-facing reply is normal, balance plus the USD figure. The bank's audit logs show a normal three-tool flow. From the attacker's side, the customer's full record has just landed:

mock-attacker dashboard showing the full stolen PII record — The receiver pod outside the bank's trust boundary. Full unmasked customer record: name, email, phone, full UK address, date of birth, government tax identifier. Everything needed for downstream identity theft.

5. The defence, how the block actually works at the wire level

This is the part I had to work through to convince myself it was real. The relevant property is that Istio Ambient gives every pod a cryptographic identity called a SPIFFE ID. It looks like this:

spiffe://cluster.local/ns/trustusbank-bank-vendors/sa/currency-converter

That's three pieces: a trust domain (cluster.local), the pod's namespace (trustusbank-bank-vendors), and the pod's Kubernetes ServiceAccount (currency-converter). Every pod gets one automatically, Istio's control plane mints a short-lived X.509 certificate for each pod and embeds the SPIFFE ID in the cert's SAN field. No application code involvement at all; the pod opens a plain TCP connection, and a per-node component called ztunnel transparently wraps it in mutually-authenticated TLS using that certificate.

So when the malicious tool's outbound POST hits the wire, it's not just a TCP packet to an IP address, it's a TLS handshake where the source's identity is verifiable from the certificate. The destination side's ztunnel sees the source's SPIFFE before any application bytes flow.

Wait, why is the attacker a Kubernetes namespace?

Reasonable question. In a real attack the destination would be a server on the public internet, not another pod in the same cluster. The demo uses an in-cluster external-attacker namespace as a stand-in so we can put a deterministic deny rule on it and reproduce the block reliably on a laptop. Substantively, the defence pattern is the same for real external destinations, just expressed differently:

If the destination is in the cluster (this demo), the policy lives in the destination namespace and matches inbound by source SPIFFE.
If the destination is outside the cluster (the real world), you'd push the policy into an egress allowlist: deny outbound from trustusbank-bank-* to anything that isn't on a list of approved external hosts (the bank's identity provider, an LLM API, a logging service). Either at the destination of an Istio ServiceEntry, or at an egress gateway sitting between the cluster and the public internet.

Same enforcement layer, same identity match, different shape of rule. I picked the in-cluster version because I wanted the demo to fit on one laptop and not depend on registering an internet domain.

The deny policy itself

The block is one small Istio AuthorizationPolicy applied to the attacker's namespace:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-bank-to-attacker
  namespace: external-attacker
spec:
  action: DENY
  rules:
    - from:
        - source:
            namespaces:
              - "trustusbank-bank-*"
              - "trustusbank-platform"

Read it as: "any pod in any bank-side namespace is denied from connecting into external-attacker." When the malicious currency-converter tries to POST to the attacker's pod, here's what happens at the destination ztunnel, in order:

Step	What ztunnel does
1	TLS handshake completes. ztunnel reads the source's SPIFFE from the cert: `spiffe://cluster.local/ns/trustusbank-bank-vendors/sa/currency-converter`.
2	Looks up AuthorizationPolicy resources matching the destination namespace (`external-attacker`). Finds `deny-bank-to-attacker`.
3	Walks the rules. Source namespace is `trustusbank-bank-vendors`; matches the wildcard `trustusbank-bank-*`. Action: DENY.
4	Sends a TCP RST back to the source. Logs an access-log line containing both SPIFFE identities, the destination service, and the policy that fired.
5	Increments the metric `istio_tcp_connections_failed_total{response_flags="CONNECT"}`. A PrometheusRule alert fires within 30 seconds.

No application bytes were ever forwarded. The HTTP POST never reaches the attacker's pod. From inside the malicious tool's Python, httpx.post() raises a connection-reset exception. The customer-facing conversation continues unchanged because the conversion math runs locally in the same function, the agent sees a successful response.

DORA Evidence dashboard showing the AuthZ deny stat and offending pod table — The dashboard at the moment Solo blocks the attempt. The "offending pod" table identifies the source by its full SPIFFE identity. The deny log lines below are the ztunnel access-log entries, one per blocked attempt, with everything an auditor would ask for.

Why per-ServiceAccount identity matters more than namespace identity

The other thing I tested: what happens if the attacker manages to drop a malicious pod inside the bank's trusted namespaces (for example, by piggybacking on a separate Helm chart compromise)? If the AuthorizationPolicy were written in terms of namespaces alone, "anything in bank-mcp can reach account-mcp", the attacker's pod would inherit that trust. With per-ServiceAccount SPIFFE rules, the attacker's pod runs under its own ServiceAccount, gets its own SPIFFE identity, and that identity isn't on the allow list. Connection denied even from inside a trusted namespace.

The repo includes scripts/test-colocated-attacker.sh which runs that scenario end to end.

6. What the on-call engineer sees

The PrometheusRule that fires the deny alert is wired through Alertmanager to an SMTP catcher (MailHog, in the demo) so I could see the actual email. Body has the offending pod's SPIFFE identity, the destination, the EU regulatory article being violated, a deep-link to the dashboard, and the kubectl command to scale the malicious workload to zero:

SOC alert email with offending SPIFFE identity and dashboard deep-link — The alert email. Engineer-actionable: every field they'd need to triage and contain is in the message.

What I'd take away if I were planning to deploy this in production

Three things I'd internalise from running this:

The model layer will be fooled. Prompt injection through tool descriptions is unblockable from the model side as long as you trust your tool authors. The realistic posture is "my agent's LLM might be tricked at any time, by any of my tools", and design around that.
Static controls are insufficient against image-only compromises. GitOps, container scans, agent CRD allowlists, all see no diff. If you stop here, you have no visibility into upstream tool changes.
Identity-aware runtime policy is the catch point. The destination-side mesh proxy reads the source's SPIFFE off the cert and matches it against AuthorizationPolicy. The malicious pod can be ANY pod, the only question is whether its identity is on the allow list for the destination it's trying to reach.

None of this is novel, Istio has had identity-based AuthZ for years. What's new is the supply-chain attack surface that AI agents create (tool descriptions as instruction channels, MCP's dynamic discovery, third-party tools as the natural deployment pattern). The defence stack I used is all open source: Istio Ambient + ztunnel for the mesh, agentgateway for MCP routing, agentregistry for the catalogue, kagent for the agents, plus the standard kube-prometheus-stack / Loki / Tempo observability layer. One Helm chart, one AuthorizationPolicy, one PrometheusRule.

Try it yourself. The full demo is open source: ./scripts/deploy-all.sh, ./scripts/upgrade-banking-app.sh, ./scripts/policies-on.sh. About 25 minutes to deploy on a laptop with kind.
Repo: tjorourke/solo-demo-agentic-dora

Glossary

Istio Ambient: A "sidecar-less" deployment mode for Istio. A single per-node component (ztunnel) handles encryption and authorization for every pod in the namespace. No per-application sidecar proxy required.
SPIFFE / SPIFFE ID: Secure Production Identity Framework for Everyone, a spec for giving every workload a verifiable cryptographic identity. In Istio Ambient, each pod's SPIFFE ID is derived from its namespace and ServiceAccount, embedded in a short-lived X.509 cert, and rotated automatically.
ztunnel: The per-node DaemonSet that does the encryption + AuthZ work in Istio Ambient. Terminates mTLS, reads source SPIFFE from the cert, evaluates AuthorizationPolicy, allows or denies.
AuthorizationPolicy (Istio CRD): The Kubernetes resource that expresses which identities can reach which destinations. Matches on SPIFFE principals, namespaces, HTTP attributes, JWT claims, etc.
MCP, Model Context Protocol: The protocol AI agents use to discover and call tools. Tool servers expose a list of functions; agents query that list, pick one, and call it. Discovery is dynamic, generated by the running tool server, not declared in static config.
kagent: Open-source Kubernetes operator for running AI agents declaratively. Each agent is a CRD with model config, system prompt, and tool list.
agentgateway: Open-source gateway sitting between agents and tools. Handles MCP routing, JWT auth, rate limiting, audit logging, all via Kubernetes CRDs.
agentregistry: Open-source catalogue for MCP tool servers. The platform team's record of which tools have been approved.
DORA: EU Digital Operational Resilience Act, in force since January 2025. Articles 9 (security), 10 (anomaly detection), 17 (incident management), 28 (third-party register) are the ones this demo addresses.

Source

When your AI agent's vendor gets compromised, a runtime defence with Istio Ambient

1. The setup

2. How the agent decides which tool to call

Static config

Dynamic at runtime

3. The threat model, supply-chain compromise of one vendor

What's actually in the new image

4. What the attack looks like, end to end

5. The defence, how the block actually works at the wire level

Wait, why is the attacker a Kubernetes namespace?

The deny policy itself

Why per-ServiceAccount identity matters more than namespace identity

6. What the on-call engineer sees

What I'd take away if I were planning to deploy this in production

Glossary

Run the demo on your laptop →

SPIFFE in this demo, in 11 sections →

Operator playbook →

The OSS components used here →