An operator-grade pattern from the CreativeMinds Development (cmdev) AI engineering practice. Companion to Security Guardrails and Observability for Bedrock and the Air-Gapped LLM Deployments article.
The "leaky RAG" incident pattern
The bug that ends most enterprise AI pilots is the same bug, in different costumes. An engineer at a Tier 1 bank's pilot deployment asks the customer-support agent a routine question. The agent responds with information from a different customer's account. A user at a healthcare network asks the clinical-copilot agent for guidance on their patient and gets back text from another physician's notes. A junior analyst at a financial services firm asks the research-copilot a question and receives a fragment of a senior partner's confidential M&A workbook. None of these were prompt-injection attacks. None were exotic. Each was the same architectural flaw: the Knowledge Base retrieval did not respect the user's actual authorisation context.
This is the failure mode we call leaky RAG, and it is the single biggest reason enterprise AI projects pause indefinitely after a pilot. The CISO who reads the post-incident report concludes that the AI architecture is not safe enough to advance to broader rollout. The pause is not always called a failure — it is called "we need to re-architect for our access-control requirements" — but it is a failure.
The fix is not exotic either. The customer already has an identity provider — Active Directory, Microsoft Entra, Okta, Auth0 — that captures who is in the organisation, what role each user holds, and what data each role is authorised to see. The fix is to map that authorisation context, accurately and continuously, into the Knowledge Base retrieval layer so that the underlying vector store can only return documents the calling user is authorised to see. This piece documents how we ship that.
The architecture
Five layers, each enforcing the access boundary at a different point in the request path. Defence in depth is not slogan-ware here; each layer catches different classes of failure.
Layer 1 — Identity at the IdP
The customer's identity provider is the source of truth for who the user is and what roles they hold. The IdP captures this with whatever level of fidelity the customer has invested in — for some customers a single role per user; for others a rich tag schema with department, region, classification level, and project membership.
The federation contract: the IdP issues an authentication assertion containing the user's identifying claims. The two common shapes:
- SAML 2.0 — XML assertion with attribute statements. Most common in enterprise / banking environments where the IdP is Microsoft Entra ID / Active Directory Federation Services / PingFederate.
- OIDC — JWT with claims. Common in modern fintechs and software companies; Auth0 / Cognito / Okta OIDC.
The architectural rule: the AI workload trusts no identity claim that does not come from the IdP through the audited federation path. There is no "let me also add an is_admin: true header" shortcut. The IdP is the only place identity decisions are made.
Layer 2 — IAM Identity Center as the AWS-side broker
IAM Identity Center federates the IdP claims into AWS sessions. For a Bedrock-backed workload, the architecturally significant pieces:
- The Permission Set that the federated user is granted is the union of policy statements that determine what AWS APIs the user can call. For Bedrock specifically, this scopes which model IDs the user can invoke, which Knowledge Bases they can query, and which agent aliases they can address.
- The session tags that flow from the IdP claims through SAML / OIDC become available as
aws:PrincipalTagcondition keys in IAM policies. This is the lever that propagates the user's identity context into AWS-side authorisation decisions.
A typical permission-set policy for an end user of a Bedrock-backed application:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock-agent-runtime:Retrieve",
"bedrock-agent-runtime:RetrieveAndGenerate"
],
"Resource": "arn:aws:bedrock:eu-central-1:123456789012:knowledge-base/KB-prod-001",
"Condition": {
"StringEquals": {
"aws:PrincipalTag/tenant_id": "${aws:PrincipalTag/tenant_id}"
}
}
}
]
}
The tenant_id session tag is set from the SAML/OIDC claim. The policy condition is what stops a federated user with tenant_id=acme from invoking the KB on behalf of tenant_id=globex. This is enforced before the request reaches Bedrock — it is a control at the IAM authorisation layer.
Layer 3 — Metadata filtering at the Knowledge Base retrieval
The most important architectural decision is that the document corpus is ingested with metadata fields that describe who is authorised to see each document. Per-document metadata at ingestion time:
| Field | Type | Meaning |
|---|---|---|
tenant_id |
string | Which customer / tenant owns this document |
classification |
string | public · internal · confidential · restricted |
owner_group |
string[] | IdP group(s) that own this document |
region |
string | Data-residency region (eu · ng · uk) |
document_class |
string | Functional category for filtered retrieval |
effective_date |
timestamp | Document validity start |
expiry_date |
timestamp | Document validity end |
When the application layer calls Retrieve on the Knowledge Base, the metadata filter is constructed from the calling user's identity claims:
def build_retrieval_filter(user_claims: dict) -> dict:
"""Map IdP claims into a Bedrock KB metadata filter expression."""
return {
"andAll": [
{"equals": {"key": "tenant_id", "value": user_claims["tenant_id"]}},
{"in": {"key": "owner_group", "value": user_claims["groups"]}},
{"lessThan": {"key": "classification_level",
"value": user_claims["clearance_level"] + 1}},
{"equals": {"key": "region", "value": user_claims["region"]}},
]
}
response = bedrock_agent_runtime.retrieve(
knowledgeBaseId="KB-prod-001",
retrievalQuery={"text": query},
retrievalConfiguration={
"vectorSearchConfiguration": {
"numberOfResults": 5,
"filter": build_retrieval_filter(user_claims),
}
},
)
The filter is applied inside the Knowledge Base retrieval. Bedrock returns only chunks from documents whose metadata satisfies the filter. The agent's reasoning model never sees the chunks the user is not authorised to see, which closes the leak at the retrieval layer rather than relying on the model to be diligent about authorisation in its output.
The filter expression supports andAll, orAll, equals, notEquals, in, notIn, lessThan, greaterThan, startsWith, listContains. Composable enough for the kinds of access-control rules a real enterprise needs.
Layer 4 — Per-tenant isolation for multi-tenant workloads
For SaaS or shared-tenancy deployments, the architectural pattern depends on whether the corpus is shared or strictly partitioned.
Strict partition — one Knowledge Base per tenant. Each tenant's corpus, vector store, and metadata are isolated at the KB level. Strongest isolation; highest infrastructure cost; preferred for high-stakes regulated workloads where the threat model assumes operator-side error.
Shared KB with metadata filter — one Knowledge Base containing all tenants' corpora, with tenant_id as a mandatory metadata filter on every retrieval. Lower cost; more operationally efficient; the security relies on the metadata-filter enforcement being correct everywhere.
Hybrid — high-classification documents go to per-tenant KBs; public / low-classification documents go to a shared KB. Most production deployments end up here once they evolve past the first scale.
The defensible default we ship is strict partition for confidential and restricted classifications; shared with metadata filter for internal and public. This puts the strongest controls on the data most likely to cause an incident if leaked.
Layer 5 — Audit trail per retrieval
Every retrieval emits an audit record. The schema:
{
"timestamp": "2026-06-09T14:32:17Z",
"request_id": "req-9af3b2c1",
"calling_user": {
"sub": "user-acme-julia",
"tenant_id": "acme",
"groups": ["sales", "emea"],
"session_arn": "arn:aws:sts::123:assumed-role/idc-user-role/julia"
},
"knowledge_base_id": "KB-prod-001",
"query": "what is the standard discount policy",
"retrieval_filter": {"andAll": [...]},
"returned_document_ids": ["doc-policy-discount-v3", "doc-policy-discount-v2"],
"returned_chunk_ids": ["chunk-001", "chunk-002", "chunk-003"],
"answer_id": "ans-3f1c"
}
The record answers, for any retrieval after the fact: who asked, what did they ask, what filter was applied, which chunks were returned, what was the answer ID. For an incident response, the audit query is "show me every retrieval by user X in the past 24 hours" — and the answer is a complete enumeration in seconds.
The audit log is KMS-encrypted in S3 with Object Lock Compliance mode for the regulatory retention period. Cross-account forwarded to the Security OU per the air-gapped pattern. The workload account cannot delete its own audit records.
Mapping common IdP setups
Active Directory / Microsoft Entra ID
The most common enterprise pattern. The architectural flow:
- AD / Entra issues a SAML assertion on user authentication. The assertion contains
nameid(the user identifier), AD group memberships, and any custom attributes the customer has set up. - IAM Identity Center is configured with the customer's AD/Entra as the identity source. SCIM provisioning syncs users and groups from AD into Identity Center on a continuous schedule.
- Permission Sets in Identity Center reference AD groups for assignment — "users in the
EMEA-SalesAD group get theBedrockSalesUserpermission set." - The application reads the federated session's group memberships from the session metadata and constructs the metadata filter from them.
The discipline: the application never re-asserts AD claims itself. The claim source is always the IAM Identity Center session metadata, which is itself derived from the SCIM-synced AD state. This means a user removed from AD on a Monday morning loses access by Monday afternoon — propagation latency is the SCIM sync interval, typically 5-15 minutes.
Okta and Auth0
Both support SAML 2.0 (for IAM Identity Center federation) and OIDC (for direct application integration). The choice depends on whether the application is web-facing (OIDC) or running as an AWS workload (SAML via Identity Center).
For OIDC-direct flows — typical for B2B SaaS applications where the application is the customer-facing surface — the OIDC ID token's claims are the authorisation source. The application validates the JWT signature, extracts the claims, and uses them to construct the metadata filter. AWS access for the application itself uses a separate workload identity (IAM Role on EKS / Lambda / EC2) that has KB invocation permissions; the calling user's identity is propagated as metadata-filter inputs, not as the AWS principal.
IAM Identity Center as the identity store itself
For customers without an external IdP, IAM Identity Center can be the identity store directly. Users are managed in Identity Center; groups are managed in Identity Center; SAML is unnecessary because the federation is internal. Less common in enterprise but valid for smaller or earlier-stage customers.
Friction points — what bites in real deployments
Five frictions we have engineered past:
1. Claim drift between the IdP and the workload's filter logic
AD groups get renamed. A new attribute gets added. The IdP team changes the format of the region claim from eu to eu-central-1. The metadata filter logic in the application doesn't get updated. Suddenly users see no documents, or see documents they shouldn't.
The mitigation: claim contract tests in CI that run against a mock IdP and a known-good metadata-filter schema. Any change to either the claims or the filter logic that breaks the contract fails CI before it reaches production.
2. Stale group memberships post-offboarding
The user is removed from AD; the change takes 15 minutes to propagate to Identity Center; meanwhile the user's prior session is still valid for up to 15 minutes. There is a window where the offboarded user retains access.
The mitigation: short session durations (15 minutes for sensitive workloads; configurable in Identity Center). Combined with the AD-to-Identity-Center SCIM sync, the actual access-loss latency is bounded by the maximum of (session TTL, SCIM sync interval) — typically under 20 minutes total.
3. Document metadata drift
Documents get re-classified, ownership changes, regions get re-tagged — and the corpus's metadata in the Knowledge Base does not reflect the current state. The retrieval filter starts returning stale-authorisation-bucket results.
The mitigation: scheduled re-ingestion with metadata refresh. The KB ingestion pipeline re-reads document metadata from the source system on a regular cadence (daily for most workloads), updating the vector store metadata in place. Document deletions get propagated by an end-of-life policy that hard-deletes after a configurable grace period.
4. The "I need this to debug" backdoor
An engineer needs to debug a user's issue and wants to "just see what the user sees." The temptation is to bypass the metadata filter or to assume the user's role. Each shortcut creates a control gap.
The mitigation: break-glass impersonation through an audited workflow. The engineer requests impersonation; a designated approver (typically a security lead) approves; the engineer's session gets a special claim that allows reading another user's documents but with full audit logging of every read. The impersonation event is a discrete audit record, not an invisible bypass.
5. The "AI summarised what it shouldn't have" output leak
The retrieval filter correctly excludes a document, but a related document the user is authorised to see contains a reference to the excluded document. The model's synthesised answer mentions the excluded document by name. Has a leak occurred?
The mitigation: output-side filtering for cross-reference leaks. A post-synthesis check (LLM-as-judge or rule-based) compares the answer's entities against the retrieval-filter-excluded entities and flags potential leaks. The aggressive variant is to block the answer; the lenient variant is to redact the named references. Decision depends on the workload's tolerance for false positives.
What this taught us about enterprise scaling
Five things hold up across the RBAC-integrated deployments we ship:
1. Security is not a separate workstream from the core architecture. Bolting access control onto an existing RAG pipeline is more expensive and less defensible than building it in from day one. The architecture above is what enables most enterprise customers to advance past pilot — and what most pilots without it fail to do.
2. The IdP is the source of truth. Every shortcut that lets the application re-assert identity claims independently of the IdP is a control gap. The discipline of "the IdP says it or it didn't happen" is the architectural invariant.
3. Metadata at ingestion is the only place to enforce access control reliably. Filtering at retrieval works because the chunks the user is not authorised to see never enter the model's context window. Filtering at the model's output is brittle (the model is a probabilistic system, not a security control) and expensive (output filtering scales with response size).
4. The audit trail is the customer's compliance team's primary artefact. A CISO will read the architecture document; their compliance team will spend more time with the audit trail. The retrieval audit record schema is worth getting right early because changing it later breaks every downstream consumer.
5. The fastest path to broader rollout is showing the CISO a working RBAC architecture. Most enterprise AI projects pause because the security model is unclear or untested. The team that comes to the architecture review with a working metadata-filter pipeline, a documented IdP integration, and an audit-trail sample advances. The team that comes with hand-waving doesn't.
Engaging with cmdev
CreativeMinds Development (cmdev) ships the RBAC-integrated Knowledge Base pattern as part of every regulated-enterprise AI engagement. We work with banking under CBN CSAT, energy operators under NMDPRA and NIS2, fintechs under NDPA, and healthcare networks under HIPAA-equivalent regional regimes. The identity-integration discipline is universal across them; the per-regime nuances (specific PII categories, retention requirements, cross-border filters) get tuned per engagement.
- Email: [email protected]
- Cloud security services: /services/cloud-security
- Companion architecture series: Amazon Bedrock for Production AI, Air-Gapped LLM Deployments, Custom Evaluation Frameworks, Day 2 — Mitigating Non-Deterministic Failures, Cold-Start Latency and Cost for Multimodal RAG, Compliance Automator case study
Mayowa A. is CTO of CreativeMinds Development. He leads cmdev's AI engineering practice for regulated enterprises across Africa and the EU.
