Compliance as Code in 2026: What Actually Reaches Production vs the Policy-as-Code Buzz

A compliance officer at a fintech in Amsterdam walked me through her SOC 2 audit response last quarter. Two engineers had spent three weeks pulling screenshots out of Datadog, Wiz, and the change-management tool — one screenshot per control, dated, captioned, dropped into a shared folder. She had a Rego library sitting in a GitHub repository called policy. The library was beautifully written. The auditor had asked, gently, whether she had any evidence of the controls actually firing. She did not. The policy ran in audit-only mode. The screenshots were the evidence. Compliance-as-code was the slide deck.

This is the gap between the pitch and the production. The pitch — encode the rules, run them in CI, let the engine produce the audit artefact as a side effect — has been sold by every cloud-security vendor since 2023. The Wiz Academy explainer from April 2026 is the representative version. Definitional, benefit-focused, almost silent on what the engineering actually looks like.

The engineering reality is messier. Most organisations claiming compliance-as-code are running Open Policy Agent in advisory mode, surface a Rego library on GitHub, and still hand-produce the audit evidence at quarter-end. The library is real. The enforcement is not. The auditor still asks for the screenshot. The line between organisations that have shipped compliance-as-code and organisations that have bought the marketing is not the policy library. It is the evidence pipeline that hangs off the end of it.

Key takeaways

Compliance-as-code claims fail in four predictable places: policies run in audit-only mode, drift alerts fire into a queue nobody reads, the CI override is two clicks and engineers click it weekly, and the auditor still wants screenshots because the policy run output is not in a form they can query.
Enforcement has to land at five layers — pre-commit, CI gate, admission control, runtime, and evidence generation. Most programmes cover one or two.
Four tool stacks ship in production: OPA + Rego (most flexible, steepest curve), Kyverno (Kubernetes-native, lightest syntax), Cedar (AWS-aligned, authorisation-shaped), and cloud-native admission (AWS Config + SCPs, Azure Policy, GCP Org Policy — free, less expressive). Each has a clear best-fit layer.
The hard part is the evidence pipeline, not the policy library. Auditors accept compliance-as-code in place of screenshots when output is structured, queryable, tamper-evident, and queryable by audit period — the pattern that ships is S3 partitioned by date and control, indexed in Lake Formation, queried through Athena.
Override discipline is the second hard part. Every override is a ticket with a sunset date and a named owner. Override volume is the leading indicator of policy quality — high counts mean the policy is wrong, not that engineers are reckless.

The five layers where compliance-as-code has to enforce: pre-commit (Conftest checks against Terraform plan output), CI gate (Checkov, Trivy IaC, Conftest blocking PR merge), admission control (Kyverno or OPA Gatekeeper or Validating Admission Policy on every Kubernetes apply), runtime (Falco for runtime policy, CSPM for continuous configuration evaluation), and evidence generation (every policy run emits a structured record into S3, indexed by Lake Formation or the Glue catalogue, queryable through Athena for the audit period). A horizontal band underneath shows a single Rego source-of-truth in git distributing signed bundles to each enforcement point. — Compliance-as-code enforces at five layers, not one. The library on the left is the easy part. The evidence pipeline on the right is what turns audit response from three person-weeks into sixty minutes.

The Promise on the Slide

Pick your controls — SOC 2 CC6.1, ISO 27001 A.8.9, the relevant paragraphs of EU AI Act Article 9, the risk-management measures under NIS2. Express each one as policy code. Run the policy in the pipeline. Watch the engine emit machine-readable evidence as a side effect of the build. Replace the screenshot. Reclaim the quarter where two engineers stop shipping product and start writing evidence narratives.

That is the slide. Then there is the morning after the slide.

What most teams ship is narrower than the slide. The Rego library lives in a repository called policy. CI runs it in audit-only mode and posts a comment on the pull request. The library never blocks a deploy, because nobody on the team is yet confident enough in the policy quality to make it blocking. When the audit arrives, somebody assembles a slide deck describing what the policy would have caught if it had been enforced. That is not policy-as-code. That is a library. The gap between the two is where every honest compliance-as-code conversation actually lives.

Five Locks on Five Doors

Think of compliance-as-code as a building with five doors a determined visitor could walk through. A working programme has a lock on each door. A failing programme has one elaborate lock on the front and chains-and-tape on the rest, then tells the auditor every door is locked.

The first door is the developer's keyboard. Conftest runs against the developer's local Terraform plan, Helm values, or raw YAML before the change ever reaches a pull request. The point of this layer is not to block — hooks are easy to skip and the determined engineer will skip them — but to surface the violation in the editor, where it is cheap to fix, rather than in code review three days later, where it is expensive.

The second door is the CI gate. This is where blocking actually happens. Checkov against Terraform and Kubernetes manifests with a published rule library. Trivy IaC across the same surfaces with a different library. Conftest running custom Rego against any structured input the team needs to govern. The gate fires before merge, and the gate either blocks or it does not. An advisory-only CI gate is a comment generator, not compliance-as-code.

The third door is admission control. Kyverno, OPA Gatekeeper, or Kubernetes' native Validating Admission Policy intercepts every apply before it lands in etcd. This is the layer that catches the deploy bypassing CI through a manual kubectl apply on a Friday evening. It is also the only layer that observes actual cluster state rather than proposed state. Programmes that skip admission control discover, at the worst possible moment, that the CI gate never covered the operations team's emergency-fix workflow.

The fourth door is runtime. The configuration that admission approved last week may have drifted since. A pod that started compliant may have been exec'd into and modified. Falco watches the syscall stream. CSPM tools — Wiz, Prisma Cloud, Defender for Cloud, the open-source alternatives — continuously evaluate the cloud resources themselves, not the manifests that produced them.

The fifth door is evidence generation, and it is the one that fails silently. Each of the other four emits a result. The result has to land somewhere structured, queryable, and tamper-evident. If the auditor cannot self-serve queries against the policy run history, the history is not evidence. It is data that needs a human to convert into evidence — which is the screenshot economy with extra steps.

The pattern that works puts the same lock on every door, sourced from one keyring. Pre-commit runs the same Rego as CI. CI's pass becomes a precondition for the admission controller's allow. The admission decision lands in the same evidence store as the runtime monitor's alerts. One source of policy, five enforcement surfaces, one evidence pipeline.

Choosing Your Locksmith

Vendor pitches make every policy tool sound interchangeable. They are not. Each one fits a different lock.

OPA with Rego is the universal locksmith — the most flexible engine and the steepest learning curve. It fits anywhere the input is structured JSON and the decision is allow, deny, or require-approval. Kubernetes admission through Gatekeeper. Microservice authorisation through an OPA sidecar. Agent action approval. Conftest-driven IaC checks. Envoy external authz. The downside is real: Rego policies that survive the first month and fail in the third are usually written by an engineer who treated Rego as Python.

Kyverno is the Kubernetes-native locksmith. The syntax is YAML — the same shape operators already think in. Policies look like the resources they govern. The learning curve is measured in hours, not weeks. Kyverno does mutation as well as validation, which Gatekeeper does not. It fits Kubernetes-only enforcement surfaces. Outside the cluster it does not stretch, and teams that adopt Kyverno for admission usually adopt OPA separately for everything else.

Cedar, the policy language AWS open-sourced in 2023, is the authorisation specialist. It is designed around principals, actions, resources, and conditions — narrower than Rego but easier to reason about for the use case it targets. It fits application-layer authorisation in AWS-aligned stacks. It does not yet fit the broader compliance-as-code surface, because the IaC-scanning ecosystem remains Rego-shaped.

Cloud-native admission is the locksmith people forget to evaluate. AWS Config rules, Service Control Policies, GCP Organisation Policy, Azure Policy — every major cloud has a built-in engine that runs against its own control plane and is free with the cloud. Less expressive than OPA or Kyverno, written in proprietary languages with their own quirks, and they do not extend to Kubernetes cleanly. For organisation-wide invariants — no public S3 buckets in this account, ever — they are sometimes the right answer because they cannot be bypassed by anyone with console access.

Pick the locksmith that fits the door. Cloud-native admission for organisation-level guardrails. OPA for cross-cutting policy. Kyverno for Kubernetes cluster policy. Cedar for application-layer authorisation in AWS-native stacks. Pretending one tool fits every door is the failure mode that produces the marketing-grade library.

Where the Audits Actually Fail

Four phrases recur in the conversations we have with organisations that believed they had shipped compliance-as-code.

"We have a Rego policy library" usually means the library was written for a conference talk and never made the transition into the CI gate as a blocking control. It runs in audit-only mode and the team has learned to scroll past the comments in code review. The library is real. The enforcement is not.

"Compliance evidence is automated" usually means the policy enforces the configuration but the auditor still asks for a screenshot, because the policy run output is in a log format the auditor cannot query. The screenshot economy persists because the team built a library and not a pipeline.

"Drift detection is in place" usually means a CSPM tool fires alerts into the same queue as fifty other cloud findings. On-call marks low-severity items acknowledged. Drift in non-production ages out at thirty days. Detection works. Response does not.

"Policy-as-code is in CI" sometimes means the override is two clicks — a PR label, an environment variable, a Slack-bot approval. Engineers click it weekly because the policy is wrong often enough that working around it is faster than waiting for a fix. High override volume is the leading indicator that the policy quality is the problem, not engineer behaviour. The closed-loop fix is to count overrides, treat them as bug reports against the policy, and require a sunset date on every accepted exception.

A clean library with any of these gaps still produces a failing audit. The maturity of compliance-as-code is measured by how the organisation closes those gaps, not by the policy language it picked.

The Version That Actually Works

The version that survives production has four properties.

A single source of policy truth lives in git, with tests, change-management metadata, and signed tags. Every enforcement point pulls from that repository. When the policy changes, it changes once, and the change reaches every layer through signed bundle distribution.

Layered enforcement applies the same rule at every meaningful door. Production databases must have encryption at rest — that rule runs as a Terraform plan check at pre-commit, a Checkov rule in CI, an admission controller rule when the CRD is applied, and a CSPM evaluation against the running RDS instance. One door is one bypass away from a violation. Four doors are meaningfully harder.

An evidence pipeline catches every policy run, at every layer, and lands a structured record in storage. The default we ship is S3, partitioned by date and control ID, with object-lock enabled, fed into Lake Formation for schema management and exposed through Athena. The auditor receives an Athena workgroup, saved queries against the audit period, and the ability to self-serve evidence for every sampled control.

Override discipline is the fourth and most overlooked property. Overrides will happen. The mature pattern is not to forbid them — it is to make them visible, accountable, and time-bound. Every override becomes a ticket with a named owner, a justification, and a sunset date. Override count per policy becomes a metric. A policy with a high override count is a candidate for revision, not stricter enforcement. Without this discipline, layered enforcement rots inside three months.

Why the Regulator Is Leaning In

This matters in 2026 because the regulators are starting to ask for it by another name.

SOC 2 auditors are increasingly willing to accept compliance-as-code output as evidence in place of screenshots, provided the output is structured, queryable, and tamper-evident — exactly what the pipeline described above produces. The EU AI Act's Article 11 documentation requirements for high-risk systems effectively demand machine-readable evidence of conformity. NIS2's risk-management and incident-reporting obligations are easier to evidence with policy-as-code output than with manual records reconstructed from chat history.

The regulator is not asking for OPA by name. The regulator is asking for evidence that can be produced reliably, repeatedly, from a system that has not been edited since the audit period ended. The screenshot stack cannot produce that. A two-hundred-line Rego library that emits queryable, tamper-evident evidence across a full audit period is worth more than a two-thousand-line library that emits log lines into CloudWatch and stops there.

The Bridge Most Teams Have Not Built

Teams that have invested in the policy library typically have the right tools — OPA or Kyverno in the stack, Checkov in CI, Falco in the cluster — and the wrong outputs. The policy fires. The decision is logged. The log lands in CloudWatch, Datadog, or the SIEM, in a shape designed for security operations rather than audit response. When the audit arrives, somebody on the team builds a one-off query, exports to a spreadsheet, screenshots it, and emails it to the auditor. At audit time, the compliance-as-code programme has quietly become a screenshot factory with extra steps.

The pattern that closes the gap is mechanical. Every policy run, at every layer, archives into S3 under a key scheme that maps to the control framework — SOC 2 CC6.1 evidence under evidence/soc2/cc6.1/year=2026/month=06/day=22/, ISO 27001 A.8.9 under evidence/iso27001/a.8.9/.... The Glue catalogue indexes the schemas. Athena queries the partitions. Lake Formation governs access. The auditor receives an Athena workgroup and a query template per sampled control. Sixty minutes, not three weeks.

This is the part that does not show up in vendor pitches because nobody is selling it as a product. It is integration work — the engagement where the compliance team and the data team sit in a room for two days and design the evidence schemas together. It is also the part that determines whether the audit becomes a query exercise or a reconstruction exercise.

Compliance-as-code is one of the rare practices where the engineering investment and the compliance investment are the same investment, the way the engine and the chassis of a car are the same purchase. Done correctly, the policy engineer is also producing the audit evidence; the CI gate engineer is also producing the change-control record; the CSPM analyst is also producing the continuous-monitoring evidence. Same people, same systems, same outputs, two audiences, no duplication.

Done incorrectly, the programme is an extra system bolted on top of the screenshot stack. The library exists. The screenshots still happen. The engineering investment was wasted, because it did not displace the manual work it was supposed to replace.

The policy library is the table stake. The evidence pipeline is the moat. Most claims of compliance-as-code stop at the table stake — and that is the line the auditor in the next audit cycle will quietly cross before the team realises it has moved.

FAQs

What is the honest test for "do we actually have compliance-as-code"?

Four questions. Does at least one policy block a deploy in CI, not just comment on it? Does an admission controller catch the deploy that bypasses CI? Is the policy run output in a structured store the auditor can query without a person in the middle? Are overrides tracked as tickets with sunset dates and named owners? Two or fewer yeses means the programme is a library, not policy-as-code in production.

OPA or Kyverno?

Kyverno if the enforcement surface is exclusively Kubernetes and the team thinks in YAML. OPA if there are policy surfaces outside Kubernetes — microservice authorisation, IaC scanning through Conftest, agent action approval, Envoy external authz. Most mature programmes run both, because they target different layers and the integration cost of running both is lower than the expressiveness cost of forcing one to do both jobs.

Where does Cedar fit in 2026?

Application-layer authorisation, particularly in AWS-aligned stacks where AWS Verified Permissions handles the heavy lifting. Cedar is purpose-built for principals, actions, resources, and conditions. It does not yet have the ecosystem reach to displace OPA at the IaC-scanning or Kubernetes-admission layers. Use Cedar where you would otherwise hand-roll an authorisation service. Do not use it as a general-purpose compliance-as-code engine yet.

Why is the evidence pipeline harder than the policy library?

The library is one team, one repository, a finite set of rules. The pipeline crosses team boundaries: the control framework (compliance), evidence schemas (audit), storage layer and access controls (data), integration at each enforcement layer (security engineering). Each policy run has to land in the right partition with the right schema, and the access control has to satisfy both engineering and audit reviewers. The library is built by one engineer in a quarter. The pipeline is a cross-functional programme.

Will auditors actually accept Athena queries as evidence?

Increasingly yes. The criterion auditors apply is whether evidence is structured, attributable to a control, dated, and tamper-evident. S3 with object-lock covers tamper-evidence; partitioning by date and control ID covers attribution and dating; Athena covers structured-and-queryable. The auditor still wants the methodology documented — what the policy says, how it runs, where output lands, how access to the evidence store is governed. With that document, queryable evidence has replaced the screenshot at every audit we have closed in 2026.

Companion Content

OPA for AI Agent Action Approval — the policy-as-code primitive applied to the agent surface
SOC 2 for AI Deployments: The Trust Service Criteria Reread for the Model Era — the compliance framing this article extends
SLSA Levels in Practice: What Organisations Actually Enforce — the supply-chain analogue: what reaches production versus what stays on the slide
The AI Act's August 2026 Enforcement Deadline — the regulatory tailwind for machine-readable conformity evidence
AI Bill of Materials: SBOM for AI Systems — the evidence shape for the AI surface specifically

How to engage

If your compliance-as-code programme is a policy library and the audit still produces three weeks of screenshots, the gap is the evidence pipeline, not the library. We design and ship the S3-and-Athena evidence stack that turns audit response into a query exercise, integrate it with the enforcement layers you already run, and leave behind the schemas, the queries, and the runbooks that make it sustainable. Talk to us at creativeminds.dev/contact.