Designing Strict RBAC for Enterprise Knowledge Bases on Amazon Bedrock

An operator-grade pattern from the CreativeMinds Development (cmdev) AI engineering practice. Companion to Security Guardrails and Observability for Bedrock and the Air-Gapped LLM Deployments article.

A junior analyst at a financial services firm types a routine question into the research-copilot agent her firm rolled out three weeks ago. What is the current internal view on Hartwood Capital's pricing trajectory? The agent answers in three confident paragraphs. The first paragraph is correct. The second paragraph quotes, verbatim, a sentence from a senior partner's confidential M&A workbook on a Hartwood acquisition that the analyst is not on the deal team for and has no business knowing about. The screenshot reaches the General Counsel by lunchtime. The pilot is paused by the following Monday.

This is not a prompt-injection attack. It is not exotic. It is the same bug, in different costumes, that has ended most enterprise AI pilots we have walked into in the last two years. An engineer at a Tier 1 bank's pilot asks the customer-support agent a routine question and gets back information from a different customer's account. A user at a healthcare network asks the clinical-copilot agent for guidance on their patient and gets back text from another physician's notes. The architectural flaw is always the same: the Knowledge Base retrieval did not respect the user's actual authorisation context. The model retrieved a true statement from a real document and reasoned over it fluently. The reader was simply not allowed to see it.

This is the failure mode we call leaky RAG. It is the single biggest reason enterprise AI projects pause indefinitely after a pilot. The CISO who reads the post-incident report concludes that the AI architecture is not safe enough to advance to broader rollout. The pause is not always called a failure — it is called "we need to re-architect for our access-control requirements" — but it is a failure.

The fix is not exotic either. The customer already has an identity provider — Active Directory, Microsoft Entra, Okta, Auth0 — that captures who is in the organisation, what role each user holds, and what data each role is authorised to see. The fix is to map that authorisation context, accurately and continuously, into the Knowledge Base retrieval layer so that the underlying vector store can only return documents the calling user is authorised to see. This piece documents how we ship that.

Key takeaways

"Leaky RAG" is the single most common reason enterprise AI pilots pause indefinitely — the Knowledge Base retrieval did not respect the calling user's authorisation context, and the CISO concludes the architecture is unsafe for broader rollout.
The fix maps the customer's existing IdP (Active Directory, Entra, Okta, Auth0) claims into Bedrock Knowledge Base metadata filters — applied inside the retrieval, so the model never sees chunks the user is not authorised to see.
Five defence-in-depth layers: IdP as identity source of truth, IAM Identity Center as AWS-side broker with session tags, metadata filtering at retrieval, per-tenant isolation (strict partition for confidential, shared with filter for internal), and a per-retrieval audit record.
The defensible default: strict KB partition for confidential and restricted classifications; shared KB with metadata filter for internal and public — strongest controls on the data most likely to cause an incident if leaked.
Filtering at retrieval is reliable because the chunks never enter the model's context. Filtering at the model's output is brittle (probabilistic system, not a security control) and expensive (scales with response size).

Five doors, not one wall

RBAC integration for Bedrock Knowledge Bases — user authenticates through Entra ID, Okta, Auth0, or AD; IdP issues a SAML or OIDC token with identity, roles, and group claims; IAM Identity Center federates the session; the application extracts claims and passes them as a metadata filter to the Knowledge Base Retrieve API; Bedrock returns only chunks where tenant_id, classification, owner_group, region, and document_class satisfy the filter; chunks pass through Cohere Rerank then Claude for synthesis with citations; every retrieval is logged with the calling user, filter expression, document IDs, and final citations to KMS-encrypted S3 with Object Lock. — Figure 1 — IdP claims to metadata filter — chunks the user is not authorised to see never enter the model's context.

Five layers, each enforcing the access boundary at a different point in the request path. Defence in depth is not slogan-ware here. Each layer catches different classes of failure. Think of it as the security regime at an embassy, not the lock on a flat. The guard at the gate checks the passport. The receptionist checks the appointment. The corridor leading to the secure floor is monitored. The room itself requires a keycard. Any one of those, alone, would be inadequate. Together, they hold.

The identity provider speaks first, and only the identity provider speaks

The customer's identity provider is the source of truth for who the user is and what roles they hold. The IdP captures this with whatever level of fidelity the customer has invested in — for some customers a single role per user; for others a rich tag schema with department, region, classification level, and project membership.

The federation contract: the IdP issues an authentication assertion containing the user's identifying claims. The two common shapes are SAML 2.0 — XML assertion with attribute statements, most common in enterprise and banking environments where the IdP is Microsoft Entra ID, Active Directory Federation Services, or PingFederate — and OIDC, a JWT with claims, common in modern fintechs and software companies running Auth0, Cognito, or Okta OIDC.

The architectural rule: the AI workload trusts no identity claim that does not come from the IdP through the audited federation path. There is no "let me also add an is_admin: true header" shortcut. The IdP is the only place identity decisions are made. The doorman is the only one who decides who walks in; the kitchen does not get to issue dinner reservations.

The broker at the AWS gate

IAM Identity Center federates the IdP claims into AWS sessions. For a Bedrock-backed workload, two pieces matter most.

The Permission Set that the federated user is granted is the union of policy statements that determine what AWS APIs the user can call. For Bedrock specifically, this scopes which model IDs the user can invoke, which Knowledge Bases they can query, and which agent aliases they can address.

The session tags that flow from the IdP claims through SAML or OIDC become available as aws:PrincipalTag condition keys in IAM policies. This is the lever that propagates the user's identity context into AWS-side authorisation decisions.

A typical permission-set policy for an end user of a Bedrock-backed application:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock-agent-runtime:Retrieve",
        "bedrock-agent-runtime:RetrieveAndGenerate"
      ],
      "Resource": "arn:aws:bedrock:eu-central-1:123456789012:knowledge-base/KB-prod-001",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/tenant_id": "${aws:PrincipalTag/tenant_id}"
        }
      }
    }
  ]
}

The tenant_id session tag is set from the SAML or OIDC claim. The policy condition is what stops a federated user with tenant_id=acme from invoking the KB on behalf of tenant_id=globex. This is enforced before the request reaches Bedrock — it is a control at the IAM authorisation layer.

The filter that runs inside the retrieval, not after

The most important architectural decision is that the document corpus is ingested with metadata fields that describe who is authorised to see each document. Per-document metadata at ingestion time:

Field	Type	Meaning
`tenant_id`	string	Which customer / tenant owns this document
`classification`	string	public · internal · confidential · restricted
`owner_group`	string[]	IdP group(s) that own this document
`region`	string	Data-residency region (eu · ng · uk)
`document_class`	string	Functional category for filtered retrieval
`effective_date`	timestamp	Document validity start
`expiry_date`	timestamp	Document validity end

When the application layer calls Retrieve on the Knowledge Base, the metadata filter is constructed from the calling user's identity claims:

def build_retrieval_filter(user_claims: dict) -> dict:
    """Map IdP claims into a Bedrock KB metadata filter expression."""
    return {
        "andAll": [
            {"equals": {"key": "tenant_id", "value": user_claims["tenant_id"]}},
            {"in": {"key": "owner_group", "value": user_claims["groups"]}},
            {"lessThan": {"key": "classification_level",
                          "value": user_claims["clearance_level"] + 1}},
            {"equals": {"key": "region", "value": user_claims["region"]}},
        ]
    }

response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId="KB-prod-001",
    retrievalQuery={"text": query},
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults": 5,
            "filter": build_retrieval_filter(user_claims),
        }
    },
)

The filter is applied inside the Knowledge Base retrieval. Bedrock returns only chunks from documents whose metadata satisfies the filter. The agent's reasoning model never sees the chunks the user is not authorised to see, which closes the leak at the retrieval layer rather than relying on the model to be diligent about authorisation in its output. The librarian does not hand you the wrong book and ask you to politely give it back. The librarian does not let you into the wrong stack in the first place.

The filter expression supports andAll, orAll, equals, notEquals, in, notIn, lessThan, greaterThan, startsWith, and listContains. Composable enough for the kinds of access-control rules a real enterprise needs.

One library or many — and when to keep the doors separate

For SaaS or shared-tenancy deployments, the architectural pattern depends on whether the corpus is shared or strictly partitioned.

Strict partition gives each tenant its own Knowledge Base. Each tenant's corpus, vector store, and metadata are isolated at the KB level. Strongest isolation; highest infrastructure cost; preferred for high-stakes regulated workloads where the threat model assumes operator-side error.

Shared KB with metadata filter puts all tenants' corpora in one Knowledge Base, with tenant_id as a mandatory metadata filter on every retrieval. Lower cost; more operationally efficient; the security relies on the metadata-filter enforcement being correct everywhere.

The hybrid pattern puts high-classification documents in per-tenant KBs and public or low-classification documents in a shared KB. Most production deployments end up here once they evolve past the first scale.

The defensible default we ship is strict partition for confidential and restricted classifications; shared with metadata filter for internal and public. This puts the strongest controls on the data most likely to cause an incident if leaked.

The trail you can reconstruct in seconds

Every retrieval emits an audit record. The schema:

{
  "timestamp": "2026-06-09T14:32:17Z",
  "request_id": "req-9af3b2c1",
  "calling_user": {
    "sub": "user-acme-julia",
    "tenant_id": "acme",
    "groups": ["sales", "emea"],
    "session_arn": "arn:aws:sts::123:assumed-role/idc-user-role/julia"
  },
  "knowledge_base_id": "KB-prod-001",
  "query": "what is the standard discount policy",
  "retrieval_filter": {"andAll": [...]},
  "returned_document_ids": ["doc-policy-discount-v3", "doc-policy-discount-v2"],
  "returned_chunk_ids": ["chunk-001", "chunk-002", "chunk-003"],
  "answer_id": "ans-3f1c"
}

The record answers, for any retrieval after the fact: who asked, what did they ask, what filter was applied, which chunks were returned, what was the answer ID. For an incident response, the audit query is "show me every retrieval by user X in the past 24 hours" — and the answer is a complete enumeration in seconds.

The audit log is KMS-encrypted in S3 with Object Lock Compliance mode for the regulatory retention period. Cross-account forwarded to the Security OU per the air-gapped pattern. The workload account cannot delete its own audit records. The trail is written in a notebook that lives in a vault the workload itself cannot open.

Three identity stores, three slight variations on the same discipline

Active Directory and Microsoft Entra ID

The most common enterprise pattern. AD or Entra issues a SAML assertion on user authentication. The assertion contains nameid (the user identifier), AD group memberships, and any custom attributes the customer has set up. IAM Identity Center is configured with the customer's AD or Entra as the identity source. SCIM provisioning syncs users and groups from AD into Identity Center on a continuous schedule.

Permission Sets in Identity Center reference AD groups for assignment — users in the EMEA-Sales AD group get the BedrockSalesUser permission set. The application reads the federated session's group memberships from the session metadata and constructs the metadata filter from them.

The discipline: the application never re-asserts AD claims itself. The claim source is always the IAM Identity Center session metadata, which is itself derived from the SCIM-synced AD state. A user removed from AD on a Monday morning loses access by Monday afternoon — propagation latency is the SCIM sync interval, typically 5 to 15 minutes.

Okta and Auth0

Both support SAML 2.0 for IAM Identity Center federation and OIDC for direct application integration. The choice depends on whether the application is web-facing (OIDC) or running as an AWS workload (SAML via Identity Center).

For OIDC-direct flows — typical for B2B SaaS applications where the application is the customer-facing surface — the OIDC ID token's claims are the authorisation source. The application validates the JWT signature, extracts the claims, and uses them to construct the metadata filter. AWS access for the application itself uses a separate workload identity — an IAM Role on EKS, Lambda, or EC2 — that has KB invocation permissions; the calling user's identity is propagated as metadata-filter inputs, not as the AWS principal.

IAM Identity Center as the identity store itself

For customers without an external IdP, IAM Identity Center can be the identity store directly. Users are managed in Identity Center; groups are managed in Identity Center; SAML is unnecessary because the federation is internal. Less common in enterprise but valid for smaller or earlier-stage customers.

The five things that bite in production

Claim drift between the IdP and the workload's filter logic

AD groups get renamed. A new attribute gets added. The IdP team changes the format of the region claim from eu to eu-central-1. The metadata filter logic in the application does not get updated. Suddenly users see no documents, or see documents they should not. The change happened on the IdP side. The workload never heard.

The mitigation is claim contract tests in CI that run against a mock IdP and a known-good metadata-filter schema. Any change to either the claims or the filter logic that breaks the contract fails CI before it reaches production.

Stale group memberships post-offboarding

The user is removed from AD. The change takes 15 minutes to propagate to Identity Center. Meanwhile the user's prior session is still valid for up to 15 minutes. There is a window where the offboarded user retains access — the same way a dismissed employee's badge still opens the door for the rest of the shift.

The mitigation is short session durations — 15 minutes for sensitive workloads, configurable in Identity Center. Combined with the AD-to-Identity-Center SCIM sync, the actual access-loss latency is bounded by the maximum of session TTL and SCIM sync interval — typically under 20 minutes total.

Document metadata drift

Documents get re-classified, ownership changes, regions get re-tagged — and the corpus's metadata in the Knowledge Base does not reflect the current state. The retrieval filter starts returning stale-authorisation-bucket results.

The mitigation is scheduled re-ingestion with metadata refresh. The KB ingestion pipeline re-reads document metadata from the source system on a regular cadence — daily for most workloads — updating the vector store metadata in place. Document deletions get propagated by an end-of-life policy that hard-deletes after a configurable grace period.

The "I need this to debug" backdoor

An engineer needs to debug a user's issue and wants to "just see what the user sees." The temptation is to bypass the metadata filter or to assume the user's role. Each shortcut creates a control gap. The engineer means well. The audit log does not care.

The mitigation is break-glass impersonation through an audited workflow. The engineer requests impersonation; a designated approver — typically a security lead — approves; the engineer's session gets a special claim that allows reading another user's documents but with full audit logging of every read. The impersonation event is a discrete audit record, not an invisible bypass.

The output leak that bypasses the retrieval filter

The retrieval filter correctly excludes a document, but a related document the user is authorised to see contains a reference to the excluded document. The model's synthesised answer mentions the excluded document by name. Has a leak occurred? The chunk was never in context. The reference was.

The mitigation is output-side filtering for cross-reference leaks. A post-synthesis check — LLM-as-judge or rule-based — compares the answer's entities against the retrieval-filter-excluded entities and flags potential leaks. The aggressive variant is to block the answer. The lenient variant is to redact the named references. The decision depends on the workload's tolerance for false positives.

What the architecture is actually protecting

Five things hold up across the RBAC-integrated deployments we ship.

Security is not a separate workstream from the core architecture. Bolting access control onto an existing RAG pipeline is more expensive and less defensible than building it in from day one. The architecture above is what enables most enterprise customers to advance past pilot — and what most pilots without it fail to do.

The IdP is the source of truth. Every shortcut that lets the application re-assert identity claims independently of the IdP is a control gap. The discipline of "the IdP says it or it did not happen" is the architectural invariant.

Metadata at ingestion is the only place to enforce access control reliably. Filtering at retrieval works because the chunks the user is not authorised to see never enter the model's context window. Filtering at the model's output is brittle — the model is a probabilistic system, not a security control — and expensive, because output filtering scales with response size.

The audit trail is the customer's compliance team's primary artefact. A CISO will read the architecture document. Their compliance team will spend more time with the audit trail. The retrieval audit record schema is worth getting right early because changing it later breaks every downstream consumer.

The fastest path to broader rollout is showing the CISO a working RBAC architecture. The team that comes to the architecture review with a working metadata-filter pipeline, a documented IdP integration, and an audit-trail sample advances. The team that comes with hand-waving does not — and the analyst quoting the senior partner's M&A workbook never gets the answer she needed in the first place.

FAQs

Why apply the access filter at retrieval rather than at the model's output?

The model is a probabilistic system, not a security control. Filtering at the output relies on the model to be diligent about authorisation in its synthesised answer — brittle in principle, expensive in practice (scales with response size). Filtering at retrieval means the unauthorised chunks never enter the context window in the first place. The leak is closed at the data layer.

How fast does access loss propagate when a user is removed from AD?

Bounded by the maximum of session TTL and SCIM sync interval. With 15-minute Identity Center sessions and a 5-15 minute SCIM sync, total access-loss latency is typically under 20 minutes. Shorter session durations cut the upper bound at the cost of more frequent re-authentication.

When should we use a shared Knowledge Base versus per-tenant KBs?

Per-tenant KBs (strict partition) for confidential and restricted documents — strongest isolation, highest infrastructure cost, preferred where the threat model assumes operator-side error. Shared KB with `tenant_id` metadata filter for internal and public documents — lower cost, security relies on filter enforcement being correct everywhere. The hybrid default lands here in production.

Can an authorised document's reference to a restricted document leak the restricted content?

Yes, this is the cross-reference leak failure mode. The retrieval filter correctly excludes the restricted document, but a related authorised document mentions it by name. The mitigation is output-side filtering — an LLM-as-judge or rule-based check that compares the answer's entities against retrieval-filter-excluded entities and either blocks or redacts the references. The right strictness depends on the workload's tolerance for false positives.

What does the audit trail need to capture to satisfy a compliance team?

Per retrieval: timestamp, request ID, calling user identity with tenant and group claims, KB ID, query text, the metadata filter applied, the document IDs returned, the chunk IDs returned, and the answer ID. KMS-encrypted in S3, Object Lock Compliance mode for regulatory retention, cross-account forwarded to a Security OU. The workload account cannot delete its own audit records.

Engaging with cmdev

CreativeMinds Development (cmdev) ships the RBAC-integrated Knowledge Base pattern as part of every regulated-enterprise AI engagement. We work with banking under CBN CSAT, energy operators under NMDPRA and NIS2, fintechs under NDPA, and healthcare networks under HIPAA-equivalent regional regimes. The identity-integration discipline is universal across them; the per-regime nuances (specific PII categories, retention requirements, cross-border filters) get tuned per engagement.

Email: [email protected]
Cloud security services: /services/cloud-security
Companion architecture series: Amazon Bedrock for Production AI, Air-Gapped LLM Deployments, Custom Evaluation Frameworks, Day 2 — Mitigating Non-Deterministic Failures, Cold-Start Latency and Cost for Multimodal RAG, Compliance Automator case study

Mayowa A. is CTO of CreativeMinds Development. He leads cmdev's AI engineering practice for regulated enterprises across Africa and the EU.