Case Study: Compliance Automator — Building Audit-Grade AI for Regulated Markets in Public

An operator-grade case study from the CreativeMinds Development (cmdev) AI engineering practice. The companion open-source repository is live: github.com/Samueladewole/compliance-automator.

Key takeaways

A Tier 1 bank's compliance team spends roughly twelve days hand-assembling a regulator's evidence pack — CloudTrail extracts, control mapping, formatted PDF. The compliance-automator closes that gap to twelve minutes with a structured, citation-rich, audit-grade output.
Architecture composes prior cmdev patterns: air-gapped Bedrock deployment inside the customer VPC, Strands + AgentCore runtime, multi-model routing (Haiku for routing, Sonnet for synthesis, Cohere Embed v3 and Rerank v3 for retrieval), Guardrails wrapping every invocation, and event.interrupt() for human-in-the-loop sign-off on the evidence pack.
Open-source and deployed inside the customer's AWS account. There is no cmdev-hosted SaaS. CloudTrail Lake, Security Lake, IAM events, and the policy Knowledge Base stay within the customer's cryptographic boundary.
Four-week build roadmap with tagged weekly releases — Week 1 repo bootstrap and ADRs, Week 2 Strands agent + KB against real Bedrock, Week 3 Terraform and CDK deployable to a fresh account, Week 4 300-item golden set and audit-grade quality report.
The repo structure matters more than the architecture diagram. A CISO evaluating an open-source compliance tool spends their first ten minutes in the repository — quickstart, honest status table, runnable scaffold — and those minutes decide whether the conversation continues.

A letter on a Tuesday

It arrives by registered post, addressed to the company secretary. The subject line is polite — "Request for cybersecurity governance evidence under NCPS 2021 CNII obligations, Q1 2026." Inside, a numbered list. Privileged-access changes in production over the prior quarter. The approval trail for each. The IAM policy diffs that produced them. The audit log entries that confirm execution. The mapping back to regulatory controls. Response deadline: fourteen days.

The compliance team starts on Wednesday morning. By Friday they have a CloudTrail Lake extract for the right time window. By the middle of week two, a draft mapping of events to NDPA Section 39 and CBN CSAT control IDs. By Friday afternoon, an analyst is hand-formatting the pack into a PDF the regulator will accept. The team works the weekend. The pack ships on day twelve.

It will happen again next quarter. And the quarter after that. Every regulated enterprise in Africa and the EU now lives in a steady cadence of regulator queries — NDPA processing-record requests one month, CBN CSAT examination cycles the next, NMDPRA posture reviews, NIS2 Article 21 evidence requests, GDPR data-subject inquiries, sector-regulator one-off probes. The work is repetitive, time-pressured, expert-intensive, and a poor use of senior compliance time. It is closer to grinding flour by hand each morning than to building anything new.

The compliance-automator is our open-source answer. A regulator's evidence query goes in. A structured, citation-rich, audit-grade evidence pack comes out — in twelve minutes, not twelve days. This piece is the case study: what we are building, what the architecture looks like, what the repository contains today, what ships in four weeks, and which engineering decisions a CISO has to see before trusting the output.

What lands on the screen

The product is a side-by-side comparison portal. On the left pane sits the regulator's query — or a draft policy, contract, or operational document. On the right, the agent's response: non-compliant clauses highlighted, deep-linked to the exact page and section of the official government regulation, with the evidence pack auto-generated underneath.

Three queries we have heard from real compliance officers, each one the kind of request that eats a week. "Show me all privileged-access changes in production for the past 90 days, with the approval trail." The agent walks CloudTrail Lake for IAM policy changes against production-scoped resources, joins them with the approval workflow records, maps each change to NDPA Section 39 and the equivalent CBN CSAT control, and returns a PDF evidence pack with citations the auditor can follow back to source. "Review this draft third-party vendor contract for NIS2 Article 21 supply-chain compliance." The agent retrieves the relevant supply-chain clauses from NIS2, highlights mismatches in the draft, and deep-links each finding to the regulatory text. "Produce the quarterly CSAT board-level evidence pack for Q1 2026." The agent runs the standing CSAT query set against the bank's evidence sources, formats the output to the regulator's preferred template, and signs the artefact with KMS so the chain of custody holds.

The architecture

Compliance Automator architecture — regulator-style query enters a Strands agent on AgentCore Runtime inside the customer's VPC; Claude Haiku routes, Claude Sonnet synthesises; action tools query CloudTrail Lake, Security Lake, and a Bedrock Knowledge Base over the regulatory corpus; Guardrails wrap every call with PII filters (NIN, BVN), denied topics, and contextual grounding; event.interrupt() gates the final evidence pack for human sign-off; output is a signed PDF and citation index. — Figure 1 — Regulator query in, audit-grade evidence pack out, ~12 minutes, entirely inside the customer's VPC.

Every component is documented in docs/architecture.md. The architecture is not new invention — it is the composition of patterns from the prior cmdev articles, assembled the way a city is built from streets, plumbing, and zoning rather than from any single building.

The substrate is the air-gapped Bedrock pattern. The agent never sends customer data outside the customer's VPC. PrivateLink endpoints reach bedrock-runtime, bedrock-agent-runtime, KMS, S3, and Secrets Manager. Customer-managed keys encrypt every persistent artefact. CloudTrail data events forward to a separate Security OU account so a compromised workload cannot destroy its own audit trail.

Sitting alongside is the eval-driven engineering pattern — a 300-item golden set spanning the three regulatory regimes, LLM-as-judge calibration against human SME labels, drift detection running in production. The agent harness is Strands on AgentCore, with hooks for audit, steering handlers for safety, and an event.interrupt() gate on the evidence-pack generation step so a human signs off before anything leaves the building.

Multi-model routing keeps the per-query cost predictable. Claude Haiku picks which evidence sources are relevant. Claude Sonnet synthesises the answer. Cohere Embed v3 and Rerank v3 handle retrieval. The Bedrock cost-optimisation pattern — cascade routing, prompt caching, top-K discipline — runs underneath. And security and observability are the Part 6 stack: Guardrails on every invocation with PII filters, a custom denied-topic for production-mutating actions without approval, and model invocation logs that double as the regulatory artefact.

What's already in the repository

The repository is live at github.com/Samueladewole/compliance-automator. You can clone it and run the scaffold tonight.

git clone https://github.com/Samueladewole/compliance-automator
cd compliance-automator
make install
make run     # returns a structurally valid scaffold evidence pack

Shipped already: a README that opens with quickstart and an honest status table, the MIT licence, a Python project skeleton wired with ruff and mypy and pytest, a runnable CLI in agent/cli.py that returns a valid-shape scaffold evidence pack so the end-to-end path is exercisable from day one, docs/architecture.md with the system overview and ADR index, docs/local-aws-setup.md covering Bedrock model access and region selection and cost expectations, and parallel Terraform and CDK roadmaps in their own README files. The folder structure stands ready for population — agent/tools/, agent/hooks/, agent/prompts/, eval/, data/regulations/, data/synthetic/.

What is building behind that scaffold: a working Strands agent in agent/pipeline.py wired to the five action tools, Terraform modules and CDK constructs that deploy the full air-gapped architecture, a 300-item evaluation golden set with LLM-as-judge and a signed monthly PDF for the regulator, a Next.js side-by-side comparison portal, and a public regulatory corpus drawn from NDPA 2023, the CBN CSAT extracts that are publicly available, EU NIS2 Article 21, and a NIST SP 800-53 subset. Synthetic CloudTrail and Security Lake data lets the end-to-end pipeline run without touching production.

Target ship date: end of June 2026. Watch the repo or creativeminds.dev/blog for milestone announcements.

Why we are building it in public

Three reasons, and a CISO recognises each one as the right shape of trust signal.

The first is that the architecture is the trust signal. A CISO does not buy a compliance system based on a vendor's deck the way a homeowner does not buy a safe by listening to the salesman knock on it — they want to examine the locking mechanism. An open-source repository is the locking mechanism, fully visible and immediately auditable. Buying decisions move faster when the code is open because the buyer can answer their own questions.

The second is that the buyer's data never leaves the buyer's tenancy. The compliance-automator deploys inside the customer's AWS account using the air-gapped Bedrock pattern. There is no cmdev-hosted SaaS sitting in the data path. The customer's CloudTrail Lake, Security Lake, IAM events, and Knowledge Base of policies all stay within their own cryptographic boundary. Open source is what makes this credible — the customer can read every line that touches their data.

The third is that building in public compounds. Every commit, every architecture decision record, every eval-result publication is a signal that cmdev is doing real engineering, not theatre. The commit history is a continuous credibility surface — by the time a buying conversation reaches a deal review, the buyer has already evaluated us on the work itself. For a consulting practice, no marketing campaign matches that.

Four weeks, four tagged releases

The work is sequenced to ship a working end-to-end agent by end of June 2026:

Week	Milestone	Repo signal
Week 1 (this week)	Repo bootstrap, architecture documented, sample regulatory corpus ingested	Scaffold + first ADRs land
Week 2	Strands agent + Knowledge Base wired against real Bedrock + Cohere; CloudTrail-query and retrieve-regulation tools shipped	`make run` returns a real evidence pack against synthetic data
Week 3	Terraform modules and CDK constructs deployable to a fresh AWS account; air-gapped pattern validated	`terraform apply` produces a working deployment in a clean test account
Week 4	300-item golden set + LLM-as-judge eval harness + drift detection; PDF evidence-pack template	`make eval` produces the audit-grade quality report

Each weekly milestone is a tagged release. Each ships with a short blog post in this series — what we built, what surprised us, what we engineered past.

What the bootstrap week already taught us

Three things surfaced in the first week that are worth flagging for the buying audience.

The repository structure matters more than the architecture diagram. A CISO evaluating an open-source compliance tool spends their first ten minutes inside the repo, not on the website. Those ten minutes have to convey three things — a quickstart that works, an honest status table showing what is shipped versus what is not, and a runnable scaffold so the end-to-end path is exercisable from day one. We rewrote the README three times in week one to land the structure that does not waste those minutes.

Shipping parallel Terraform and CDK costs more than either alone, but the cost is small and the trust signal is large. Most teams have a strong preference, and shipping both means meeting them where they live. Maintaining two infrastructure expressions of the same system is real work — we mitigate it by treating Terraform as the canonical source and CDK as the synthesised equivalent, with terraform plan snapshots committed against the CDK synth output as a regression check. It is the kind of decision that looks expensive on a slide and turns out to be cheap once you are inside the build.

The "case study" framing is itself slightly wrong. The compliance-automator is not a one-off engagement we documented after the fact. It is a reference implementation we and our customers can fork. This article is a snapshot of an evolving product, not a retrospective. The cmdev publishing voice is shifting to match — the reference architecture series and this case study compose as the front door of a working open-source practice, not as portfolio decoration.

If your next regulator letter arrived tomorrow, would your team ship the evidence pack on day twelve, or on minute twelve?

FAQs

Why open-source instead of a hosted SaaS?

The architecture is the trust signal. A CISO does not buy a compliance system based on a marketing deck — they read the architecture, ask whether the security properties hold under their threat model, and watch the deployment behave under load. An open-source repo makes the architecture fully visible and immediately auditable, and the buyer's data never leaves the buyer's tenancy.

How does the air-gapped Bedrock pattern apply here?

The agent deploys inside the customer's AWS account. PrivateLink endpoints to bedrock-runtime, bedrock-agent-runtime, KMS, S3, and Secrets. CMK encryption on every persistent artefact. CloudTrail data events forwarded to a separate Security OU account. The customer's CloudTrail Lake, Security Lake, IAM events, and policy Knowledge Base stay within their cryptographic boundary — there is no cmdev-hosted SaaS in the data path.

Which regulatory regimes does the compliance-automator cover?

NDPA 2023, CBN CSAT extracts where publicly available, EU NIS2 Article 21, and a NIST SP 800-53 subset in the public corpus. The architecture is regime-agnostic — the regulatory corpus is just another Knowledge Base. Customers ingesting their own internal control catalogue or sector-specific regulation (NMDPRA posture reviews, HIPAA-equivalent regional regimes) get the same pipeline against their content.

What stops the agent from generating wrong evidence with confidence?

Three layers. Guardrails with PII filters (NIN, BVN regex), denied-topic policies, and contextual grounding wrap every invocation. event.interrupt() gates the evidence-pack generation step for human-in-the-loop sign-off on the final artefact. And the eval harness runs against a 300-item golden set with LLM-as-judge calibration against human SME labels, with drift detection in production.

Why ship both Terraform modules and CDK constructs?

Most teams have a strong preference. Shipping both means meeting them where they are. The cost is small — maintaining two infrastructure expressions of the same system — and we mitigate it by treating Terraform as the canonical source and CDK as the synthesised equivalent, with terraform plan snapshots committed against the CDK synth output as a regression check. The trust signal of meeting the buyer in their toolchain is larger than the cost.

How to engage

Three concrete moves you can make if this is the shape of work your team needs:

Star the repo. github.com/Samueladewole/compliance-automator — the star count is the public signal of demand and helps the project compound. Watching gives you the milestone-release notifications without inbox noise.

Read the air-gapped Bedrock article and the eval harness article. They are the architectural substrate. If those resonate with your CISO and compliance leadership, the compliance-automator is the natural fit.

Email [email protected] for a deployment consultation. We engage with regulated enterprises in Africa and the EU on a four-phase model (diagnostic → foundation build → co-managed operations → optional MSSP). The compliance-automator is the substrate; the engagement is what makes it production at your scale. Direct, no sales fluff.

Companion content

Live repo: github.com/Samueladewole/compliance-automator
Architecture substrate: Air-Gapped LLM Deployments on AWS Bedrock, Custom Evaluation Frameworks for Enterprise LLMs
Reference series: Amazon Bedrock for Production AI (8 parts)
Banking reference: AWS Architecture for Nigerian Banks (3 parts) and Hardening before AWS (3 parts)

Mayowa A. is CTO of CreativeMinds Development. CreativeMinds Development (cmdev) ships production AI for regulated enterprises across Africa and the EU.