Why Agent Governance That Passes Pilots Fails Audit

Most enterprise architects have stopped treating the system prompt as where governance lives. That conclusion is no longer controversial. The interesting question — the one most current answers do not fully resolve — is where exactly governance does live in an agentic system, and whether the answer a given enterprise has today covers the actual surface area or only the parts that were easiest to address first.

That distinction matters because the failure mode is consistent. Most internal governance designs cover two or three surfaces by design, a fourth by accident, and miss the fifth entirely. The gaps are usually in the same places. They tend to surface at audit, during an adversarial test, or when an agent crosses a boundary the original design did not anticipate.

A recent IBM Research paper, Governance by Construction for Generalist Agents, offers a useful taxonomy for the agent-behavior surface. Five structurally distinct intercept points, each suited to a different category of governance problem. The taxonomy is worth walking through not as a clever architecture to admire, but as a checklist a CIO or chief architect can run against their own system in an afternoon. For each intercept point: does a deterministic control sit there, what does it inspect, and can it be bypassed by an action whose wording differs from its effect.

The Five Intercept Points and Their Gaps

Intent Guard sits before the agent reasons about a request. Its job is to hard-block a class of action that must never proceed — bulk deletes, PII exports, anything categorically out of bounds — by terminating the run before the agent ever sees the instruction.

Most enterprises have something here, usually built as a keyword or topic filter at the request layer. The gap-surfacing question: does the filter only match on request wording, or does it also evaluate against semantic intent? Wording-based filters are bypassed by paraphrase. Semantic filters require an embedding step and a threshold, which most homegrown filters do not have. This is the most commonly present checkpoint and the most commonly incomplete one.

Playbook sits inside the agent's reasoning step, but as enforced structure rather than open instruction. When a request matches a known workflow, the Playbook injects the required sequence of steps — verify identity before granting access, check sanctions before opening the account, pull the coverage record before mapping the term. The agent stops improvising the order of operations.

Most enterprises do not have this checkpoint explicitly. They have it implicitly, embedded in carefully constructed prompts that describe the required sequence. The gap-surfacing question: is the sequence enforced by a structured policy the agent cannot deviate from, or by an instruction the agent can interpret loosely under adversarial input? For workflows where the sequence of checks is itself the compliance requirement, the answer determines whether a regulator would accept the design.

Tool Guide sits at the tool-call boundary. Just before a tool is invoked, this checkpoint enriches the tool's description with contextual requirements — pagination rules, jurisdiction-specific constraints, data-handling notes. Multiple guides can apply at once, additively, and the enrichment reverts after the call.

Most enterprises do not have this layer at all. Tool descriptions are static, written at integration time. The gap-surfacing question: when the same tool is called from two different contexts — different jurisdictions, different data classifications, different user roles — does its behavior differ, and is that difference enforced at the tool boundary or trusted to the agent's reasoning? When the difference is trusted to the reasoning, it is one prompt injection away from being absent.

Tool Approval is the human gate, and it is architecturally the most interesting of the five. It sits after the agent has generated its tool calls. It does not inspect the request wording. It inspects the actual generated code — the calls the agent has decided to make. If a sensitive tool is being called, execution pauses and waits for human confirmation.

Many enterprises that have a human-in-the-loop layer have it at the request level, not the code level. The gap-surfacing question: when a destructive action would be taken, does the approval gate inspect what the user asked for, or what the agent has decided to do? The two are not the same, and the gap between them is exactly the surface that paraphrased adversarial inputs exploit. A request-level gate can be slipped. A code-level gate cannot, because what gets inspected is the generated invocation, not the wording that produced it.

This is where defense in depth becomes real rather than nominal. Intent Guard inspects wording. Tool Approval inspects generated code. Two checkpoints checking different aspects of the same action is depth. Two checkpoints checking the same thing twice is redundancy.

Output Formatter sits at the response boundary. It enforces the shape of what leaves the system — a structured schema, a redacted template, a controlled set of fields.

Many enterprises do this in their application layer, downstream of the agent. The gap-surfacing question: is the output shape enforced before the response leaves the agent, or after it reaches the application? When enforcement is downstream, the agent has already produced — and possibly logged — content the application then filters out. For audit purposes, what the agent produced is the artifact, not what the application delivered. The two should not diverge.

Two Surfaces Outside the Checkpoint Model

The five intercept points are a strong taxonomy of where governance acts on the agent's behavior. They are not a complete taxonomy of where governance lives in an agentic system. Two surfaces sit outside the checkpoint model, and many internal answers under-address both. These are not additional checkpoints. They are different in kind — substrate concerns rather than execution concerns — and the difference is sharper than it first appears.

Every one of the five intercept points sits inside the agent's runtime. Even Tool Approval, the most external of them, inspects code the agent generated. The inference process is upstream of every check in the model. That means every checkpoint's input is, at some level, a product of the model — and an adversarial input that corrupts the inference corrupts what the checkpoints see. The substrate surfaces below are different. Their decisions do not depend on the inference path at all.

Access enforced outside the inference path. The five intercept points assume the agent already has the tools and data it has. They do not address whether the call the agent is about to make — the actual API invocation, with its identity, its scopes, its target resource — is permitted at the network boundary. In production, that decision belongs to an API gateway: a deterministic enforcement plane that evaluates the call itself against policy and returns allow or deny. The gateway does not see the prompt. It does not see the agent's reasoning. It does not care what the inference decided. It evaluates identity, context, scope, rate, and resource against rules that hold regardless of what the model is doing.

This is the architectural property that distinguishes a gateway from the checkpoint model: it is fully independent of the inference path. If the inference is compromised, jailbroken, prompt-injected, or simply mistaken, the gateway's decision is unchanged. It governs the act, not the agent's intention to act. The checkpoints govern the agent. The gateway governs what the agent is permitted to reach, on the wire, regardless of how the agent arrived at the request.

Many internal answers approach access as a configuration concern handled at agent setup — role assignments, scoped tokens, allowlists baked into the integration. The gap-surfacing question: when the agent's inference is the only thing that has changed, does anything about what it can reach change with it? If the answer is no, access is being governed at the boundary, where it belongs. If the answer is yes — if a prompt change, a tool description change, or a context manipulation can shift what the agent successfully calls — then access is being governed inside the inference path, and the strongest property of an enterprise governance posture has been given up without anyone choosing to give it up.

Event flow as a governance surface. Most architects are alert to the more visible risks in the data path — synthetic injection, prompt-borne content reaching downstream systems — and those risks are not the gap worth surfacing here. The quieter gap is that the event flow is rarely treated as a governance surface in its own right. Real enterprise agents emit and react to events, transform data, and produce derived state. Each transformation is a governable act. The question of what transformation occurred, authorized under what policy, traceable to what decision, has to be answerable on the substrate where the transformation happened.

Two specific weaknesses tend to show up. The first is lineage that cannot be reconstructed cleanly under audit — the chain from raw event to derived outcome exists in the system somewhere, but assembling it requires forensic work across multiple platforms, and the authorizing decision at each step is logged in some places and inferred in others. The second is thoroughness of consideration: whether the event substrate has been designed as a governance surface or whether it has been treated as a platform concern that lineage and policy enforcement will somehow inherit from. Many current answers are closer to the second. The gap does not surface in pilots. It surfaces when an auditor asks how a particular derived value came to be what it is, and the answer requires reconstruction rather than retrieval.

What ties the two substrate surfaces together is the property the checkpoint model does not have. A gateway evaluates a call without consulting the inference. An event substrate, if designed as a governance surface, records a transformation and its authorizing policy without consulting the inference. The strongest governance primitives in an agentic system are the ones that do not depend on the inference path being trustworthy. Many current answers concentrate governance inside the inference path, where it is weakest, and leave the surfaces outside it under-engineered.

The Design Problem Itself

The diagnostic in the previous section — walking the seven surfaces and naming the control at each one — is the necessary first step. It is not the whole exercise. Many enterprises who run that diagnostic discover what they expected to discover: gaps in specific surfaces, some addressable in the near term, others requiring deeper architectural work. That is useful, and it justifies the time spent.

The harder finding, and the one that tends to surface only when the diagnostic is run rigorously, is that the policies across the seven surfaces do not tell a consistent story. Intent Guard blocks a category of action that Tool Approval, configured separately, would have permitted. A Playbook enforces a sequence that the gateway's scope policy does not require. The event substrate records lineage for transformations the checkpoint layer has already permitted, but the authorizing decisions are recorded in a different system on a different schema. Each surface has a control. The controls were designed by different teams, at different times, against different threat models. The result is a collection of point controls that share a system but not a posture.

This is the meta-observation worth ending on. The seven surfaces are not seven independent design problems. They are one design problem with seven enforcement points. The governance and the policies that drive it have to be designed coherently across the plane — at production-version planning time, before each surface is built out independently and the inconsistencies harden into the system. The IBM paper is a useful contribution because it makes five of the surfaces precise. It does not, and is not trying to, address the design problem itself. The design problem is what determines whether a regulated workload survives audit, and it is the work many current answers have not yet done.

“A request-level gate can be slipped. A code-level gate cannot, because what gets inspected is the generated invocation, not the wording that produced it.”

The honest read of what governance by construction earns and where its limits lie — the token cost, the places where probabilistic mechanisms still sit inside the system, the boundaries of a checkpoint-only posture — is the next conversation in this series.

Why Agent Governance That Passes Pilots Fails Audit

The Five Intercept Points and Their Gaps

Two Surfaces Outside the Checkpoint Model

The Design Problem Itself

Where does governance sit in your agentic system today?

The Practitioner's Briefing