The Agent Did Exactly What the Attacker Told It To

Recently, an attacker filed a support ticket on a Supabase-backed application. The ticket contained hidden instructions directing a Cursor IDE agent — connected to the database via MCP and running with the service_role key — to read a private credentials table and paste its contents back into the ticket thread. The agent complied. No database permissions were violated. The model did exactly what an attacker told it to do, in a setting where the model's judgment was the only thing standing between the attacker and a credentials table.

The Supabase incident is the case study that Mostafa Ibrahim builds his recent Towards Data Science piece around, and his framework — four attack surfaces (prompt, tool, memory, planning loop) each requiring its own defense — is the cleanest map of agent security terrain currently in circulation. Architects deploying agents at scale should read it. What the piece does not press on, because it is written for architects and security engineers, is the budget question one layer up: every defense Ibrahim recommends is an execution-layer control, and most enterprises are still funding agent safety as if it were a model-layer problem. That allocation is the conversation worth having with a CIO.

Model-layer safety is probabilistic. Execution-layer safety is deterministic. Most enterprises are currently paying full price for the first and treating the second as an afterthought. That allocation should reverse.

The distinction matters because agentic AI has changed what "safety" means in production. When the system was a chatbot, safety meant the model refused to say harmful things, and the probability that it would refuse correctly was the relevant metric. When the system is an agent that reads documents, writes to databases, calls APIs, and remembers what it did yesterday, safety means the system cannot take an action it was not authorized to take. Those are different problems, and they live in different layers — only one of which gives a CIO something they can put in front of a regulator.

Probabilistic controls

Model-layer controls are the ones built into the model itself: refusal training, constitutional fine-tuning, system prompt hardening, alignment work. These reduce the likelihood that a model will do the wrong thing. They are real, they are valuable, and the labs investing in them are doing important work. But they are probabilistic by construction. Ibrahim cites Stanford research showing that fine-tuning attacks bypassed safety filters in 72 percent of Claude Haiku cases and 57 percent of GPT-4o cases — failures both vendors have acknowledged. A 99.7 percent refusal rate is excellent in a research benchmark and unacceptable in a financial services agent that handles ten thousand transactions a day, because that residual three-tenths of a percent is now thirty failures, and the failures are not random. They cluster in exactly the adversarial conditions an attacker is selecting for.

The Supabase incident is what that limit looks like in production. The agent did not malfunction. It read text, treated instructions inside that text as commands, and executed them through a credential it was authorized to hold. Every step was within the model's normal operating envelope. The point is not that the model layer is broken; the point is that it is not designed to carry deterministic guarantees. Asking it to do so is a category error.

Deterministic controls

Execution-layer controls operate outside the model. They are enforced by the platform that the agent runs on, not by the agent's own reasoning. A row-level access policy in the data layer does not ask the agent nicely to respect it; the policy returns no rows. A tool gateway that requires a signed approval for any write operation does not depend on the agent's judgment; the call fails at the boundary. A provenance log that records every memory write does not trust the agent to be honest about what it did; it records what happened.

This is what Ibrahim's four surfaces look like when defended deterministically. A scoped credential at the tool surface makes the Supabase credentials-table read fail at the connection. A read-only MCP configuration makes the write-back fail at the gateway. Provenance tracking and trust-weighted retrieval at the memory surface make poisoned entries identifiable rather than silently authoritative. Reasoning logs at the planning loop surface make goal hijacking detectable before the cascade Ibrahim describes — the 87 percent downstream contamination from a single compromised orchestrator — has time to propagate. None of these defenses depend on the model recognizing the attack. They depend on the platform refusing to execute the action regardless of what the model decides.

These controls are deterministic in the engineering sense: given the same inputs, they produce the same outputs, and their behavior is auditable. They also map cleanly onto existing enterprise governance frameworks. A CIO who can demonstrate to an auditor that no agent can write to the general ledger without a human approval, because the approval requirement is enforced by the platform and not by the agent's promise to behave, is having a fundamentally different conversation than one who is presenting refusal-rate statistics.

Why the allocation matters

Every enterprise deploying agents at scale will end up with both layers in production. The question is where the investment goes, and where the accountability sits.

The current pattern is to delegate safety to the model vendor and treat the platform as a passive runtime. This is comfortable because it externalizes the problem, but it is also the configuration producing the incidents Ibrahim catalogs and the industry reports increasingly document — Gravitee's 2026 survey found 88 percent of organizations reported confirmed or suspected agent security incidents in the past year, against only 14.4 percent of agentic systems going live with full security approval. The gap between those numbers is execution-layer debt.

The reallocation is straightforward in principle: treat the model layer as a probabilistic input to the system, and put deterministic controls everywhere the system can take consequential action. The substrate for this is not new. WSO2's API gateway and identity stack have been mediating service-to-service calls with scoped tokens and per-action policy for years; Cloudera's SDX, Atlas, and Ranger were built to enforce that same separation inside the data layer. Both predate the current agent wave, and both happen to be the right substrate for the problem because they were already the right substrate for the problem before the problem had a name.

What CIOs should ask

Three questions clarify where an organization actually sits.

First: if an agent in your environment attempted an unauthorized action right now, would the action be blocked by the platform or by the model declining to attempt it? If the answer is the model, you are paying for probabilistic safety and calling it sufficient.

Second: can you produce, for any decision an agent made in the last thirty days, an auditable trace of what it read, what it remembered, what tools it called, and what intermediate reasoning led to the final action? If not, the execution layer is not instrumented, and the model layer cannot fill that gap regardless of how well it is trained.

Third: when you scale from one agent to fifty, does each one bring its own safety story, or do they all inherit deterministic controls from the platform they run on? The first answer does not scale. The second one does.

“Execution-layer controls operate outside the model.”

Model-layer safety is necessary. It is not where the bulk of the safety budget belongs. The deterministic controls — the ones that work the same way every time, that an auditor can verify, that a regulator can accept — live one layer down, in the platform. That is where the budget should shift, and that is the conversation worth having before the next agent goes into production.

The Agent Did Exactly What the Attacker Told It To

Probabilistic controls

Deterministic controls

Why the allocation matters

What CIOs should ask

How is your organization balancing model-layer and execution-layer controls?

The Practitioner's Briefing