OPA for AI Agent Action Approval: Policy-as-Code for Agent Guardrails

Key takeaways

Agent action approval is a policy problem, not a prompt problem. A rule written into the system prompt is a policy enforced by the same component being attacked — the security equivalent of asking the suspect to handle the evidence.
System prompts fail as a policy surface for four reasons: they are jailbreakable, they produce no structured decision log, they are not version-controlled in any way that survives a SOC 2 audit, and they have no central change-management surface.
Open Policy Agent is the policy decision point; the agent runtime is the enforcement point. Before any tool executes, the runtime sends OPA a structured input (principal, agent, tool, args, context) and OPA returns allow/deny plus reasons in sub-millisecond Rego evaluation.
The Rego patterns that ship by default: role-to-tool allowlist, principal-of-record match, scope intersection (never union), cost gates with separation-of-duties on the approver, and resource-class gates that distinguish staging from production.
OPA does not defend against prompt injection or harmful generation — those are model-level controls. OPA defends against the action that follows. The model can think whatever it wants; the action does not happen unless policy approves it.

OPA policy evaluation flow for AI agent action approval: agent proposes a tool call, the runtime PEP builds a structured input, OPA evaluates it against a versioned Rego bundle, decision fans out to allow / require approval / deny, the runtime enforces, every hop emits an audit event joined on trace_id, and signed bundles distribute from git through CI to OCI registry to sidecars in each agent pod. — The policy evaluation flow — every tool call passes the PDP, the runtime enforces, the audit log remembers. Below: the structured input contract, the rules that ship by default, and how signed bundles reach the sidecars.

The wrong question to ask the model

The first time an enterprise architecture review picks apart an agent system, the question that lands hardest is rarely about the model. It is about the action. Who decided this agent could trigger a refund? Where is that decision written down? If we change the rule next week, what is the change-management surface? Show me the audit trail of every action this agent has attempted, including the ones we blocked.

The honest answer in most agent deployments we see is that the rule lives in the system prompt. A paragraph somewhere between "you are a helpful assistant" and the tool schemas that says, in English, "do not refund more than $500 without human approval." That is not policy. That is a suggestion delivered to a probabilistic system that has been jailbroken in production at least twice.

Agent action approval is a policy problem, not a prompt problem. And the security industry has had the right tool for policy problems for nearly a decade. Open Policy Agent was built to decouple policy from application code, became a CNCF graduated project on the strength of Kubernetes admission control and microservice authorisation, and turns out to fit the AI agent action-approval problem almost perfectly. Every tool call is a policy decision. Every action is a candidate for allow or deny. The decision logic needs to live somewhere auditable, versioned, and reviewable by people who do not speak Python. That is a Rego file in a git repository, evaluated by an OPA sidecar.

Why prompts are the wrong place for policy

There are four reasons the system prompt fails as a policy surface, and they fail in ways an auditor will notice.

First, the model can be talked out of its prompt. Prompt injection is no longer a research curiosity. Goal hijacking sits at the top of the OWASP LLM risk list because the attack surface is the entire input stream, not a parser you control. A policy written into the system prompt is a policy enforced by the same component being attacked. That is the security equivalent of asking the suspect to handle the evidence.

Second, there is no audit trail of why an action was approved. When the model decides "yes, refund this customer," there is no structured record of which rule fired, which inputs were considered, or which condition was satisfied. There is a generation, and a tool call, and a log line that says the tool call happened. None of those are decision logs.

Third, prompts are not version-controlled in any sense that survives a SOC 2 audit. They get edited in product reviews, A/B tested by growth teams, and rolled back when conversion drops. "What rule did this agent operate under on March 14th" is not a question the prompt repository can answer.

Fourth, there is no central change-management surface. If five agents share a policy ("never write to the production database without human approval"), the policy lives in five system prompts and drifts in three of them within a quarter. Policy that does not have a single owner is policy that is silently inconsistent.

OPA addresses each of these directly. Policy lives in Rego files. Rego files live in git. Git has version control, code review, and signed tags. OPA emits structured decision logs that say which rule fired and with what input. Policy is centralised and distributed as versioned bundles. The model never gets a vote on whether the action is allowed.

What OPA actually does

OPA is a general-purpose policy engine. It accepts a structured JSON input, evaluates it against a Rego policy, and returns a structured decision. That is the whole product. Everything else is integration.

The architectural separation matters. OPA is the policy decision point. Your agent runtime is the policy enforcement point. The agent runtime sends OPA a request that describes what is about to happen. OPA decides. The agent runtime enforces. The decision and the enforcement are logged separately, by different components, into the same audit stream.

Rego is the policy language. It is declarative, it composes, and it is designed for fast evaluation. Typical policy decisions evaluate in well under a millisecond. The language is not Python and not JavaScript, and people who try to write it as either of those have a bad time for the first week and a productive decade afterwards.

For distribution, OPA pulls policy bundles. A bundle is a tarball of Rego files plus optional data, served from an HTTP endpoint, an S3 bucket, or an OCI registry. OCI bundles matter here: it means the policy that governs your AI agents can be a signed, versioned artefact in the same registry as your container images, with the same supply-chain controls.

The integration point with an agent is simple. Before the agent executes any tool, the runtime calls OPA with a structured input describing the principal, the agent, the tool, the arguments, and the context. OPA returns an allow/deny decision and a reason. The runtime enforces the decision.

The Rego patterns for agent controls

Concrete examples follow. The structure assumes a single decision package at data.agent.allow and a parallel data.agent.reasons for the deny rationale, which is the pattern that audits cleanest.

Allowlist of tools per agent role

The first control is the most boring and the most important. An agent has a role. A role has a list of tools it is permitted to invoke. Anything outside that list is denied.

package agent

default allow := false

# Each role maps to the set of tools it is allowed to invoke.
role_tools := {
  "customer_support": {"lookup_order", "issue_refund", "send_email"},
  "data_analyst":     {"run_query", "render_chart"},
  "sre_triage":       {"read_logs", "describe_incident", "page_oncall"},
}

allow if {
  input.tool in role_tools[input.agent.role]
}

This is the floor. An agent acting outside its declared role surface fails at the first gate.

Principal-of-record check

The action's effective principal must match the request's authenticated user. This is the rule that stops an agent from quietly escalating to a service account when a human's session is what kicked off the request.

package agent

principal_matches if {
  input.action.principal == input.request.authenticated_user
}

deny contains "principal mismatch" if {
  not principal_matches
}

This is the rule that catches the most subtle agent identity bugs and the most subtle abuse paths.

Scope intersection, not union

When an agent acts on behalf of a user, the effective scope is the intersection of the agent's permitted scopes and the user's scopes. Not the union. The instinct to take the union is the same instinct that produces confused-deputy bugs.

package agent

effective_scopes := input.agent.scopes & input.request.user.scopes

allow if {
  required_scope := tool_required_scope[input.tool]
  required_scope in effective_scopes
}

If the agent has db:write and the user does not, the action does not get db:write. If the user has db:write and the agent does not, the action still does not get db:write. Intersection.

Cost gates

Some actions cost money. Some cost a lot of money. Policy can refuse to authorise an action whose declared cost exceeds a threshold, and require human approval above it.

package agent

default allow := false

allow if {
  input.action.estimated_cost_usd <= 5
}

allow if {
  input.action.estimated_cost_usd > 5
  input.action.estimated_cost_usd <= 500
  input.context.human_approval.granted == true
  input.context.human_approval.approver != input.action.principal
}

deny contains "cost exceeds policy ceiling" if {
  input.action.estimated_cost_usd > 500
}

Note the separation-of-duties check: the approver cannot be the principal. This is the rule that stops "approve your own refund" patterns when the agent is acting as a user with approval authority.

Resource-class gates

The same tool can be safe against staging and catastrophic against production. The policy distinguishes.

package agent

production_resources := {"prod-db-primary", "prod-payments-api", "prod-billing-queue"}

requires_human_approval if {
  input.action.resource in production_resources
}

deny contains "production write without human approval" if {
  requires_human_approval
  input.action.write == true
  not input.context.human_approval.granted
}

These are not the only patterns, but they are the ones we ship by default on every new agent. The composition matters more than any single rule. Each policy is a function of structured input; together they form a defence-in-depth surface where an action must pass every relevant rule to be approved.

The structured input contract

The reason OPA scales as a policy surface is that the agent runtime sends it a structured input and nothing else. The contract is small and stable. The Rego is what evolves.

A representative input:

{
  "trace_id": "01H8...",
  "request": {
    "authenticated_user": "u_482911",
    "user": { "scopes": ["read:order", "write:refund"] }
  },
  "agent": {
    "id": "agent.support.v3",
    "role": "customer_support",
    "scopes": ["read:order", "write:refund", "send:email"]
  },
  "tool": "issue_refund",
  "args": { "order_id": "o_99831", "amount_usd": 142.50 },
  "action": {
    "principal": "u_482911",
    "estimated_cost_usd": 142.50,
    "resource": "prod-payments-api",
    "write": true
  },
  "context": {
    "principal_action_count_last_hour": 4,
    "human_approval": { "granted": false }
  }
}

The agent runtime is responsible for constructing this object faithfully. The policy is responsible for the decision. Neither component carries logic that belongs to the other, which is the entire point.

The architecture pattern that survives audit

The reference architecture is short and worth committing to muscle memory.

The agent generates a tool call. The agent runtime intercepts the call and constructs the policy input from the principal, the tool, the arguments, the agent's identity, and the runtime context. The runtime sends the input to OPA. OPA runs in one of two shapes: a sidecar in the same pod as the agent runtime, or a clustered service behind a load balancer with regional failover. OPA evaluates the policy, returns a decision and a list of reasons, and writes a decision log entry.

The runtime enforces. Allowed actions proceed to the tool. Denied actions return a structured refusal to the agent loop and never touch the real system. The denial reason is logged to the same audit store that holds the agent's other actions. After the tool executes, the outcome is logged with the same trace ID so the decision and its consequence can be reconstructed end-to-end.

The decision logs are the load-bearing artefact. OPA emits structured JSON describing which rule fired, what input was considered, and what verdict was reached. Those logs land in the same audit pipeline as everything else the agent does. When someone asks "why was this action allowed on March 14th," the answer is a query against the log store with a trace ID.

For high-availability, sidecar OPA gives you local low-latency evaluation with no network hop, and the policy bundle pulls update the sidecar continuously from an OCI registry. The agent runtime has a circuit breaker: if OPA is unreachable, the runtime defaults to deny. There is no graceful degradation here. A policy decision point that fails open is not a policy decision point.

What OPA does not solve

OPA is not a defence against prompt injection. It is not a defence against the model generating harmful content. It is not a defence against the model lying to itself or to the user. Those are model-level concerns and they need model-level controls: input filters, output classifiers, system-prompt hardening, and the kind of red-teaming that catches the failures classifiers miss.

OPA is a defence against the action that follows. If a jailbroken model decides to call delete_database, OPA looks at the structured input, sees that the agent's role does not permit delete_database, denies the call, and logs the denial with the trace ID. The model can think whatever it wants. The action does not happen.

This division of labour is the right one. Model-level controls reduce the probability of bad intent. Action-level controls reduce the probability that bad intent reaches a real system. Both layers fail in different ways and at different rates. Defence-in-depth is the architecture.

Production considerations

Performance is a non-issue at the scales agent systems actually run. Rego evaluation for the policies above is sub-millisecond on standard hardware. The bottleneck in your agent loop is the model, not the policy engine. If you measure OPA in the same percentile as inference, something is misconfigured.

High-availability follows two patterns. The sidecar pattern co-locates OPA with each agent runtime instance. Each sidecar pulls policy bundles independently from the registry and caches them locally. There is no network call to a remote OPA service in the hot path. Failover is per-pod. The clustered pattern runs OPA as a fleet behind a load balancer, with bundle distribution to each node. Use the sidecar pattern unless you have a specific reason not to.

Audit trail integration is where most teams underinvest. OPA's decision logs need to land in the same log store as the agent's tool-call audit trail, with the same trace ID schema. If they land in different stores, an investigator cannot reconstruct the sequence of events without doing manual joins, which means in practice no one reconstructs it.

Policy testing is a discipline of its own. Rego has a test framework. Every rule we ship has a deny test, an allow test, and a regression test against the input shape that broke us last time. Policy changes go through code review like any other code. Bundles are signed before they enter the OCI registry. Production OPA instances verify signatures on bundle pull.

What this teaches us about enterprise scaling

The pattern that emerges from running OPA in front of agents is the pattern that emerges from running policy-as-code anywhere. Decisions become artefacts. Artefacts become reviewable. Reviewability becomes the precondition for letting agents touch anything that matters.

Enterprise AI scaling is not a model problem. The frontier models we have now are already powerful enough to do real damage in production. The constraint is the surface around the model: the identity story, the action-approval story, the audit story, the change-management story. OPA is one piece of that surface. It is the piece that turns "do not refund more than $500 without approval" from a sentence in a system prompt into a Rego rule in a git repository with a signed bundle in an OCI registry and a decision log in the audit store.

The agent does not get to decide what is allowed. The policy does. That is the only architecture that scales past the pilot.

FAQs

Why isn't the system prompt good enough as a policy surface?

Four reasons that all fail in ways auditors notice. The model can be talked out of its prompt — prompt injection is no longer a research curiosity. There is no structured record of which rule fired or which input was considered. Prompts are not version-controlled in any sense that survives a SOC 2 audit. And policy that lives in five system prompts drifts in three of them within a quarter.

Does OPA add latency to every tool call?

Effectively no. Rego evaluation for the patterns above is sub-millisecond on standard hardware, and the sidecar deployment co-locates OPA with the agent runtime so there is no network hop in the hot path. The bottleneck in your agent loop is the model, not the policy engine. If you measure OPA in the same percentile as inference, something is misconfigured.

Sidecar or clustered OPA?

Sidecar unless you have a specific reason not to. Each sidecar pulls signed policy bundles independently from an OCI registry, caches them locally, and there is no network call to a remote service in the hot path. Failover is per-pod. The runtime has a circuit breaker — if OPA is unreachable, the runtime defaults to deny. A policy decision point that fails open is not a policy decision point.

Does OPA stop prompt injection?

No. Prompt injection is a model-level concern that needs model-level controls — input filters, output classifiers, system-prompt hardening, red-teaming. OPA defends against the action that follows. If a jailbroken model decides to call delete_database, OPA sees the agent's role does not permit it, denies the call, and logs the denial with the trace ID. Defence-in-depth: model-level controls reduce probability of bad intent; action-level controls reduce probability bad intent reaches a real system.

How do we test Rego policies?

Rego has a test framework. Every rule we ship has a deny test, an allow test, and a regression test against the input shape that broke us last time. Policy changes go through code review like any other code. Bundles are signed before they enter the OCI registry, and production OPA instances verify signatures on bundle pull. Decision logs land in the same audit store as the agent's tool-call audit trail with the same trace ID schema, so an investigator can reconstruct the sequence without manual joins.

Companion content

How to engage

If you are putting agents into production and the action-approval story is still living in a system prompt, that is the conversation to have. Talk to us at creativeminds.dev/contact.