The Blueprint for Air-Gapped LLM Deployments on AWS Bedrock

An operator-grade pattern from the CreativeMinds Development (cmdev) AI engineering practice. Companion to the Amazon Bedrock for Production AI series and the Hardening-before-AWS series.

The question that comes before any other

The meeting starts. The CISO of a Nigerian Tier 1 bank, or the head of digital at a European energy operator, or the IT leadership of a healthcare network in West Africa sits across the table. The first technical question is always the same. It is not which model should we use. It is not how do agents work. It is the question a homeowner asks before they buy a safe: can we put it inside the house, with no wire running out to the street.

"Can we deploy this without our data leaving our controlled boundary?"

The question matters because most AI deployment options on the market in 2026 fail it by construction. The OpenAI ChatGPT Enterprise API processes prompts on OpenAI's infrastructure. The Anthropic Claude API processes them on Anthropic's. The "AI features" baked into SaaS products typically route the prompts through the vendor's own tenancy. For a bank under CBN CSAT, for an EU energy operator under NIS2, for a Nigerian fintech under NDPA, for any defence-adjacent operator — we will send the prompts to a third party for processing is not a deployable architecture. The compliance officer says no before the CISO finishes evaluating it. The board never hears the pitch.

Amazon Bedrock answers this question — if it is configured correctly. The if carries the weight. A misconfigured Bedrock deployment looks superficially private but routes traffic through the public internet, holds data temporarily outside the customer's KMS boundary, or carries an IAM policy with a gap an examiner will drive a truck through. The pattern this article documents is the configuration we ship for regulated enterprise customers — the one that survives an examiner's first technical pass, a CISO's red-team review, and the operational reality of production traffic running at thousands of queries per second.

The diagram is the lead artefact. The regulatory mapping closes the loop. The friction points at the end are the ones we hit and engineered past in real deployments.

Key takeaways

"Can we deploy this without our data leaving our controlled boundary?" is the first technical question every regulated CISO asks. Most 2026 AI deployment options fail it because they route prompts to a third party's infrastructure.
The reference architecture is five layers: PrivateLink endpoints with no internet gateway, KMS customer-managed keys on every persistent artefact, federated identity with no static credentials, CloudTrail data events plus model invocation logging cross-forwarded to a Security OU account, and Nitro Enclaves for the highest-trust workloads.
The condition on `aws:SourceVpce` in IAM policies is the line that closes the loop — even a leaked credential cannot be used from outside the customer's VPC endpoint, because authority is bound to the network topology.
Five real frictions bite at deployment time: VPC endpoint quotas, KMS key policy size limits (use per-purpose CMKs), cold-start latency on the Bedrock path, cross-account log-forwarding delete-permission edge cases, and constant pressure to compromise the air-gap for "just one dependency" — fixed with an internal artefact mirror.
The audit trail is the deployable artefact. The architecture is plumbing; what converts a deployment into a regulatory asset is the CloudTrail data events, invocation logs, and X-Ray traces sealed against tampering and forwarded out of the workload account.

The reference architecture

Air-gapped Bedrock deployment — customer VPC with private subnets only and no internet gateway on workload subnets; Bedrock reached through PrivateLink Interface VPC Endpoints to bedrock-runtime and bedrock-agent-runtime; KMS CMKs encrypt model invocation logs in S3 with Object Lock, Knowledge Base sources, vector store, and secrets; IAM Identity Center federates human identity from the customer's IdP and workloads use IRSA or IAM Roles with no long-lived keys; CloudTrail data events forward to a separate Security OU account the workload cannot delete from; invocation logs capture prompt, response, and Guardrails trace under KMS CMK and Object Lock for the regulatory retention period. — Figure 1 — PrivateLink + KMS CMK + federated identity + cross-account audit — the configuration regulated enterprises actually deploy.

Every line in that architecture is deliberate. Five layers, each with its own configuration discipline.

Layer 1 — A network with no doors to the outside

The commitment that makes air-gapped mean anything is that workload subnets have no internet gateway and no NAT path out. The Lambda functions, EKS pods, ECS tasks, and EC2 instances calling Bedrock cannot reach the public internet even if they wanted to. Traffic to Bedrock, KMS, S3, Secrets Manager, CloudWatch — every AWS service the workload touches — runs through Interface VPC Endpoints. PrivateLink is the corridor, and the corridor only goes one place. It is a building where the windows do not open.

The VPC endpoints we deploy at minimum:

Service	Endpoint	Why
`bedrock-runtime`	`com.amazonaws.<region>.bedrock-runtime`	Model invocation (`InvokeModel`, `Converse`)
`bedrock-agent-runtime`	`com.amazonaws.<region>.bedrock-agent-runtime`	Agent invocation, knowledge-base retrieval
`bedrock`	`com.amazonaws.<region>.bedrock`	Model and Guardrails management
`bedrock-agent`	`com.amazonaws.<region>.bedrock-agent`	Agent and Knowledge Base management
`kms`	`com.amazonaws.<region>.kms`	Encryption key operations
`s3`	`com.amazonaws.<region>.s3` (gateway endpoint)	Model invocation log storage, KB source bucket
`secretsmanager`	`com.amazonaws.<region>.secretsmanager`	Credential retrieval
`logs`	`com.amazonaws.<region>.logs`	CloudWatch Logs ingestion
`monitoring`	`com.amazonaws.<region>.monitoring`	CloudWatch metrics
`sts`	`com.amazonaws.<region>.sts`	Workload identity assumption

Each endpoint sits in private subnets only, with a security group that allows ingress on port 443 from the workload subnets and nothing else. The endpoint policy goes further — bedrock-runtime permits InvokeModel and Converse only, and only against the specific model ARNs the workload is authorised to use.

This configuration closes three classes of attack at once. There is no DNS resolver path to the public internet, so exfiltration-via-DNS does not have a route. Outbound TCP/443 to anywhere except the allowed VPC endpoints is dropped at the security group, so a compromised dependency cannot phone home. And there is no network path to a third party, so the accidental prompt logged to a third party leak has nowhere to land.

Egress filtering through AWS Network Firewall is the belt-and-braces option for environments where the threat model assumes the workload itself may be compromised. For most regulated deployments, the security-group plus endpoint-policy combination is the production baseline.

Layer 2 — Cryptography as the second perimeter

PrivateLink keeps the data inside AWS's network. KMS keeps it inside the customer's cryptographic boundary. Every artefact Bedrock writes or reads is encrypted with a customer-managed key held in the customer's account, with a policy that names exactly who may decrypt it. The CMK is the deed to the house — the network is the locked door, and the deed makes the door meaningful.

Six artefact classes need CMK. The model invocation log bucket where Bedrock writes the full prompt, response, and Guardrails trace per call — CMK at rest, Object Lock in Compliance mode for the regulatory retention period. The Knowledge Base source documents in S3 — CMK with a bucket policy that restricts access to specific roles. The Knowledge Base vector store, whether OpenSearch Serverless or pgvector on Aurora — CMK at rest. Any custom model artefacts imported through Custom Model Import — CMK at rest. The Secrets Manager entries holding workload credentials — CMK at rest. And the CloudWatch log groups that may contain prompt or response fragments — CMK at rest.

The key policy is the most important configuration in the entire deployment. The principals authorised to decrypt are explicitly named — a small set of workload roles, the audit and incident-response roles, the break-glass admin. Wildcards do not appear in production CMK policies. Condition keys restrict use to the specific VPC endpoints and source accounts the workload runs in.

Key rotation is automatic on an annual cycle for most regulated workloads; some regimes require shorter cycles, and the CMK supports them natively. Cross-region replication of the key is an option when multi-region failover is required and the regulator permits the secondary region.

The data residency question — does our data ever leave Nigeria, the EU, the country in the regulator's directive — gets answered by two things together. Region selection puts the entire deployment inside the regulatory boundary. The CMK boundary makes the cryptographic argument: the key never leaves, so even if data did, it would be ciphertext nobody can read. For Nigerian banks, the deployment in eu-central-1 (Frankfurt) or af-south-1 (Cape Town) with a Nigeria-pinned KMS key satisfies the NDPA cross-border processing test when the cryptographic boundary argument is documented properly. For EU operators, the same pattern in eu-central-1 or eu-west-1 with an EU-pinned KMS key satisfies NIS2 and GDPR data-residency expectations.

Layer 3 — Identity that nobody can pocket

The IAM architecture has one rule: no long-lived static credentials anywhere in the Bedrock call path. No BEDROCK_API_KEY environment variable. No AWS access keys in Lambda configuration. No service-account JSON files committed to repositories. The unit of access is a short-lived federated session, scoped to a specific workload — the identity-first migration pattern applied to every regulated engagement.

Human identities federate from the customer's identity provider — Microsoft Entra, Okta, ADFS — into IAM Identity Center. Sessions are 15 minutes in production, 1 hour outside it. MFA enforces at the IdP layer, with hardware security keys for privileged users and platform authenticators for everyone else. Workload identities on EKS use IRSA, with each microservice carrying its own role scoped to its single function — the Bedrock-invoking service gets bedrock:InvokeModel on specific model ARNs, the knowledge-base-querying service gets bedrock-agent-runtime:Retrieve on a specific KB ID, and neither holds anything else. Lambda workloads use the execution role pattern with the same scoping. CI/CD authenticates via OIDC federation, which GitHub Actions and GitLab CI both support natively; no long-lived AWS keys live in CI secret stores.

The IAM policy pattern for a Bedrock-invoking workload role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:Converse"
      ],
      "Resource": [
        "arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-sonnet-4-6-20251022",
        "arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-haiku-4-5-20251001"
      ],
      "Condition": {
        "StringEquals": {
          "aws:SourceVpce": "vpce-0abc123def456789a"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock-runtime:InvokeAgent"
      ],
      "Resource": [
        "arn:aws:bedrock:eu-central-1:123456789012:agent-alias/AGENT_ID/prod"
      ],
      "Condition": {
        "StringEquals": {
          "aws:SourceVpce": "vpce-0abc123def456789a"
        }
      }
    }
  ]
}

The condition on aws:SourceVpce is the line that closes the loop. Even a leaked credential cannot be used from outside the customer's VPC endpoint — the workload role's authority is bound to the network topology, not to the credential itself. It is the equivalent of a key that only works in one lock, in one house, on one street.

Layer 4 — The audit trail nobody can erase

The audit-evidence layer is what makes the air-gapped deployment defensible to an examiner. Standard CloudTrail captures who created the agent, who updated the guardrail — the management plane. But it does not capture each InvokeModel call. For regulated Bedrock workloads, data events must be enabled explicitly:

resource "aws_cloudtrail" "bedrock_data_events" {
  name           = "bedrock-data-events-prod"
  s3_bucket_name = aws_s3_bucket.audit_bucket.id
  kms_key_id     = aws_kms_key.audit_cmk.arn

  advanced_event_selector {
    name = "Bedrock invocations"
    field_selector {
      field  = "eventCategory"
      equals = ["Data"]
    }
    field_selector {
      field  = "resources.type"
      equals = ["AWS::Bedrock::AgentAlias", "AWS::Bedrock::Model"]
    }
  }
}

That produces the access audit — who invoked which model at what time, under which IAM role, from which VPC endpoint. The record does not include the prompt or response themselves. Those go to model invocation logging, configured once per region:

resource "aws_bedrock_model_invocation_logging_configuration" "main" {
  logging_config {
    s3_config {
      bucket_name = aws_s3_bucket.invocation_logs.id
      key_prefix  = "invocation-logs/"
    }
    text_data_delivery_enabled  = true
    image_data_delivery_enabled = true
    embedding_data_delivery_enabled = false  # large; enable only if needed
  }
}

The invocation log bucket is the most sensitive bucket in the entire deployment. Prompts often contain PII because users ask questions about themselves and their customers. Responses can leak regenerated training data and reconstructed PII. The Guardrails trace shows what the safety system caught and what it let through. KMS CMK encryption is mandatory. Object Lock in Compliance mode locks the retention period so even the root account cannot delete the records. The bucket policy restricts access to a small set of audit and incident-response roles.

Cross-account log forwarding is the configuration that closes the compromised-account scenario. CloudTrail data events and model invocation logs forward to a separate AWS account in the Security OU. The workload account has no delete permission on the audit bucket. Even if the workload account is compromised, the attacker cannot destroy the audit trail. This is the configuration that turns we have logs into we have logs an examiner can rely on.

For regulated workloads, Object Lock retention matches the regulatory requirement — 7 years for NDPA processing records, 6 years for NIS2 incident evidence, the bank's CSAT-specific retention period for CBN-supervised data. Each cites its own regulator. None of them care about the others.

Layer 5 — When the cloud operator is in the threat model

For the highest-trust workloads — defence-adjacent, sovereign-wealth, ultra-high-value financial — the regulatory threat model sometimes assumes the cloud-provider operator itself is in scope. AWS Nitro Enclaves provide cryptographic attestation that workload code is running in a tamper-resistant isolation environment, with no operator-side debugging hooks, no console access, and the host's own root operator unable to inspect the enclave's memory.

For Bedrock specifically, Nitro Enclaves matter when the workload calling Bedrock needs to process the response in an attestable environment before any other code sees it — for example, when applying a customer-side decryption step to a response containing encrypted PII, or running a regulatory-required validation that must demonstrate it was the only code with access to the plaintext. The broader AWS security posture for AI workloads sets the surrounding controls.

Nitro Enclaves are not a default. They are the right choice for the narrow subset where the threat model assumes adversary capability beyond standard cloud security. Most cmdev engagements stop at Layers 1-4. Enclaves get added when the customer's threat model demands them.

What the architecture answers, on paper

The architecture above maps to specific obligations:

Obligation	Where the architecture delivers
NDPA 2023 — Section 39 (security of processing)	KMS CMK on every artefact + IAM least-privilege + VPC endpoints + model invocation logs as the processing record
NDPA — Section 40 (72-hour breach notification)	CloudTrail data events + invocation logs answer "what data was processed, by which model, when, under whose authority" within minutes — well within the 72-hour clock
NDPA — Cross-border processing	KMS key boundary + region selection inside the regulatory perimeter + documented cryptographic argument satisfies the cross-border test
NIS2 Article 21 — Incident handling	CloudTrail data events + invocation logs + X-Ray traces provide the incident-investigation evidence base, sealed against tampering
NIS2 Article 21 — Supply-chain security	Custom Model Import provenance, Knowledge Base source attribution, action-group Lambda dependency tracking
NIS2 Article 21 — Cryptography	KMS CMK with documented key policies, annual rotation, condition-bound use
CBN CSAT — Application security	Guardrails as input/output control; CloudTrail data events as access audit; the entire VPC-endpoint perimeter
CBN CSAT — Cryptography	KMS CMK on every encrypted resource; key policies that the bank's risk team can review
NCC Cyber Resilience Framework — Continuous monitoring	CloudWatch alarms on Guardrail interventions, model invocation anomalies, IAM role usage
GDPR Article 32 (security of processing)	Same controls; EU-region deployment with EU-pinned KMS keys
HIPAA Security Rule	Same controls + HIPAA-eligible region selection + signed Business Associate Agreement with AWS

Every one of those obligations is something an examiner asks about. The architecture produces audit-grade evidence per obligation in minutes, not weeks.

What bites at deployment time

The reference architecture is the easy part. The friction shows up at deployment time, when configuration meets reality. Five things cmdev engineers have hit and engineered past.

The first is VPC endpoint service quotas. A regional account has soft quotas on endpoints per VPC and network interfaces per endpoint. For a deployment running ten or more services that each consume Bedrock, KMS, S3, Secrets, CloudWatch, Logs, STS, and monitoring across multiple subnets, the defaults are exhausted before the deployment finishes. The fix is straightforward — open a quota-increase request — but the lead time runs 24 to 72 hours, and finding out at the launch dress rehearsal turns a routine ticket into a release-blocker. We pre-file the increases at engagement start, before the architecture is even built.

The second is that KMS key policy complexity grows non-linearly. A simple deployment authorises three principals on the workload CMK. A real production deployment authorises ten — the workload role, the audit role, the IR role, the break-glass admin, the backup role, the cross-region replication role, the CloudTrail role, the invocation log writer, the KB ingestion role, the secrets-rotation Lambda. Each needs its own statement. The policy quickly hits AWS's 32 KB key-policy ceiling. The architectural fix is per-purpose CMKs rather than one CMK for everything — one for invocation logs, one for KB sources, one for vector store, one for general application secrets. Each policy stays under 8 KB. Audit per purpose. Rotation per purpose. Cleaner under examination.

The third is cold-start latency on the air-gapped path. The VPC endpoint adds 5-10 milliseconds on the wire. KMS decryption of the inbound credential context adds another 3-8. For latency-sensitive workloads — real-time conversational agents, sub-second classification — those milliseconds accumulate into a perceptible the air-gapped path is slower experience. The fixes that work: connection pooling on the Bedrock client through boto3's botocore.config.Config with tcp_keepalive=True, reusing the Bedrock client across invocations rather than instantiating one per call, and prompt caching (per Part 7) to flatten the recurring input-token portion. For workloads where every millisecond matters, provisioned throughput converts on-demand variance into steady-state latency.

The fourth is the cross-account forwarding edge case. The pattern of audit logs go to the Security OU account is straightforward in concept. The subtlety bites in the IAM: the workload account's CloudTrail service-linked role needs s3:PutObject on the Security OU's audit bucket but not s3:DeleteObject. AWS's default IAM templates sometimes grant both. We have shipped deployments where a misconfiguration meant the workload account's compromised role could have deleted its own audit trail — the kind of finding a penetration test surfaces six months in and triggers mid-engagement remediation. The defensive pattern is explicit Deny on s3:DeleteObject and s3:DeleteObjectVersion in the workload account's CloudTrail role, plus Object Lock in Compliance mode on the audit bucket so even the Security OU's own admin cannot delete records before retention expires.

The fifth is the most human of them: the pressure to compromise the air-gap for one dependency. Every air-gapped deployment we have shipped eventually fields the request: can we just add an internet path for X, it is only used for Y. The X is usually a Python package installation, a vendor library that calls home for usage analytics, or an OS package update. The pressure compounds. The architectural answer is the internal artefact mirror — AWS CodeArtifact for Python and npm packages, ECR for container images, an internal yum or apt mirror for OS packages. Every dependency the workload needs is mirrored inside the perimeter. The air-gap holds because nothing the workload needs requires breaching it. This is one of the highest-leverage decisions in the deployment, and every regulated customer eventually needs it — getting it in early avoids the painful retrofit.

Five things that hold up

Across the deployments cmdev has shipped under this pattern, five lessons compound.

Air-gapped is a configuration discipline, not a product feature. Bedrock can run air-gapped. So can OpenSearch Serverless. So can Lambda. The difference between a deployment an examiner accepts and one they do not is the discipline applied to network, KMS, IAM, and audit configuration. The architecture above is the minimum discipline; deployments that survive heavy scrutiny carry additional discipline layered on per regulatory regime.

The audit trail is the deployable artefact. The architecture is plumbing. What converts plumbing into a regulatory asset is the audit trail — CloudTrail data events, model invocation logs, X-Ray traces, all sealed against tampering and forwarded to a separate account. We design audit-trail-first now: which queries will the regulator run, what evidence answers each one, what does the trail look like under a worst-case incident.

The buying conversation is shorter when the architecture is published. Customers reach the first meeting having already read the architecture. The meeting then turns to their specific data residency, their specific examiner relationship, their specific operational constraints — not to whether air-gapped AI is possible at all. The published architecture is doing customer-acquisition work continuously.

The friction points compound when engineering teams have not seen them before. The VPC quota, the KMS policy size, the cold-start latency, the cross-account forwarding edge case, the dependency mirror — each is a 1-3 day issue in isolation. Stacked under a launch deadline, they become a release-blocking incident. The cmdev engagement model carries the standing list of these frictions in the Phase 0 diagnostic for exactly this reason.

The cost premium of air-gapped is smaller than the discount of bad assumptions. A common pre-engagement assumption is that air-gapped AI must cost 2-3 times managed-API AI because of infrastructure overhead. In practice, VPC endpoint charges plus KMS operations plus storage overhead come to single-digit percentage of model invocation costs, which dominate the bill. The Bedrock cost-optimisation patterns apply identically. The economics are not the obstacle. The implementation discipline is.

If your CISO asked tonight whether your AI workload could survive a regulator's first technical pass, what would you point at?

FAQs

What actually makes a Bedrock deployment "air-gapped"?

The workload subnets have no internet gateway and no NAT path to the public internet. Every call to Bedrock, KMS, S3, Secrets Manager, and CloudWatch goes through Interface VPC Endpoints (PrivateLink). The Lambda functions and EKS pods invoking Bedrock cannot reach the public internet even if they wanted to. That is what eliminates the exfiltration-via-DNS attack and the compromised-dependency-calling-home attack.

How does the architecture answer cross-border data-residency questions?

Region selection plus the KMS boundary. The deployment runs entirely in regions inside the regulatory perimeter (eu-central-1 or af-south-1 for Nigerian banks; eu-central-1 or eu-west-1 for EU operators). The CMK never leaves, so even if data somehow did, it would be cryptographically meaningless. The documented cryptographic argument is what satisfies NDPA's cross-border processing test and NIS2/GDPR data-residency expectations.

Why per-purpose CMKs instead of one CMK for everything?

A realistic production deployment has ten or more IAM principals authorised on the workload CMK — workload role, audit role, IR role, break-glass admin, backup role, replication role, log writer, KB ingestion role, secrets rotation. The single-CMK policy quickly hits AWS's 32 KB key-policy limit. Splitting into per-purpose CMKs (one for invocation logs, one for KB sources, one for vector store, one for app secrets) keeps each policy under 8 KB, gives audit-trail per purpose, and lets rotation cadences vary by sensitivity.

What is cross-account log forwarding actually protecting against?

The compromised-account scenario. If CloudTrail data events and model invocation logs stay in the workload account, an attacker with sufficient authority in that account can destroy the audit trail. Forwarding to a separate Security OU account where the workload account has no delete permission — combined with Object Lock in Compliance mode — turns "we have logs" into "we have logs an examiner can rely on."

Do most regulated deployments need Nitro Enclaves?

No. Nitro Enclaves matter when the regulatory threat model assumes the cloud-provider operator itself is in scope — defence-adjacent, sovereign-wealth, ultra-high-value financial. For typical regulated AI workloads, Layers 1-4 (network, encryption, identity, audit) are the production baseline. Enclaves get added when the threat model demands it.

Engaging with cmdev

CreativeMinds Development (cmdev) is the engineering studio behind this architecture. We ship production-grade AI for regulated enterprises in Africa and EU — banks under CBN CSAT, energy operators under NMDPRA and NIS2, fintechs under NDPA, healthcare networks under HIPAA-equivalent regional regimes. Our engagement model is a four-phase pattern: diagnostic, foundation build, co-managed operations, optional full managed services. The architecture in this article is the substrate; the engagement is what makes it production at your scale.

Email: [email protected]
Cloud security services: /services/cloud-security
Companion architecture series: Amazon Bedrock for Production AI, AWS-for-banks, Hardening before AWS

Mayowa Adewole is CTO and Principal AI Engineer at CreativeMinds Development. He leads cmdev's AI engineering practice for regulated enterprises across Africa and the EU, with deployments in production for banking, energy, and critical-infrastructure customers.