Security Guardrails and Observability for Amazon Bedrock

Series · Amazon Bedrock for Production AI · Part 6 of 8 ← Part 5: Multi-step Workflows with Step Functions · Security Guardrails and Observability · Part 7: Cost Optimization on Bedrock →

Key takeaways

Guardrails and observability ship together — one prevents the bad output, the other proves the prevention worked when the regulator asks.
Reference Guardrails by ID, not inline — one policy, many agents, rollback via version pinning, propagation without redeployment.
Five layers must all be present: identity perimeter (no static creds, per-agent IAM, VPC endpoints), Guardrails policy (denied topics, PII filters, jailbreak block, contextual grounding), CloudTrail data events for invocations, model invocation logging with KMS CMK + Object Lock, and CloudWatch metrics + X-Ray traces.
For Nigerian banking workloads, custom regex patterns for BVN and NIN are mandatory — the default PII detector list is US/EU-centric and misses the identifiers that matter under NDPA.
Five audit findings recur in production: draft Guardrail versions in production, disabled invocation logging "for cost," missing CloudTrail data events, `bedrock:*` IAM wildcards, and audit logs in the same account as the workload.

The brake and the dashboard ride together

Security and observability are usually written as separate concerns. For production Bedrock workloads, they are two ends of the same chain. Guardrails are what stop a bad output from leaving the model. Observability is what tells you when Guardrails caught something, what the agent did up to the catch, and whether the configuration is drifting toward the next incident. Without observability, the Guardrail is a brake on a car with no dashboard — you cannot tell whether you used it, and you cannot tell whether it still works. A team that ships one without the other is shipping half the control.

Across Parts 1-5 we have assumed both exist. This piece documents what they look like in production: the Guardrails policy structure that meets the NDPA, NIS2, and CBN CSAT bars; the IAM and network controls that constrain Bedrock access; and the observability stack that makes the system auditable in seconds rather than weeks.

The whole picture, in one frame

Bedrock security and observability architecture in five layers — request enters via API Gateway with WAF; identity perimeter at IAM Identity Center enforces federated sessions, short-lived credentials, and MFA; request hits the agent in a private subnet via VPC endpoint to bedrock-runtime and bedrock-agent-runtime; Guardrails wrap every invocation with an input filter (denied topics, PII, jailbreak, contextual grounding), the model call, and an output filter; action-group Lambdas run under scoped IAM roles with hook-system audit; observability fanout sends CloudTrail data events to a Security OU account, invocation logs to a dedicated S3 bucket with KMS CMK and Object Lock, CloudWatch metrics per agent and workflow, and X-Ray traces tying user request to model invocations and tool calls; cost tags on every component for spend attribution. — Figure 1 — Guardrails wrap every invocation; CloudTrail data events plus invocation logs make every call auditable.

Five layers. Each one is its own configuration, and each one contributes to the audit-evidence story.

Layer 1 — Where the perimeter actually lives

Identity is the perimeter; the network is the wall around it. The IAM rules are short and uncompromising.

No long-lived static credentials sit anywhere in the Bedrock call path. Human callers authenticate via IAM Identity Center, federated from the bank or enterprise IdP. Workloads on EKS use IRSA; on EC2 they use IAM roles; on Lambda they use execution roles. CI/CD federates through OIDC. There is no BEDROCK_API_KEY environment variable anywhere — not in dev, not in staging, not on a forgotten laptop. Each agent carries its own scoped IAM role: bedrock:InvokeModel against specific model ARNs, bedrock:InvokeAgent against that agent's ID, Knowledge Base access against specific KB IDs. bedrock:* on a production role is a finding, every time. Each action-group Lambda has its own scoped role too — permission to read from one DynamoDB table or write to one SQS queue, not any read. And shared resources — Knowledge Bases, custom models, guardrails — carry resource-based policies that restrict invocation to named principals.

The network controls braid through the same logic. VPC endpoints reach every Bedrock surface: bedrock-runtime for model invocation, bedrock-agent-runtime for agent invocation, bedrock for the control plane, bedrock-agent for agent management. All traffic stays on AWS's network. The workload subnets carry no public IPs and no internet gateway — action-group Lambdas, EKS pods, ECS tasks. Security groups are scoped to the specific endpoints they need, because default allow all egress is how data exfiltration happens. And every encrypted resource — invocation log buckets, KB sources, custom model artefacts, guardrail configurations — uses a customer-managed KMS key with a policy that names exactly who may decrypt.

For financial services, healthcare, NIS2-scoped operators, this is not a suggestion. It is the baseline a competent examiner checks on the first day.

Layer 2 — The two doors of every prompt

Guardrails apply at two points in every invocation. Input filtering runs before the prompt reaches the model. Output filtering runs before the response reaches the user. Think of them as two security checks on opposite sides of a single door — the prompt is checked on the way in, the response is checked on the way out, and neither check trusts the other. The policy categories are where the work lives.

Denied topics

Explicit topics the agent must refuse to discuss, named in plain language with examples.

{
  "denyTopics": [
    {
      "name": "investment_advice",
      "definition": "Specific recommendations to buy, sell, or hold particular financial instruments.",
      "examples": [
        "Should I buy NVDA?",
        "What's a good ETF to invest in?",
        "Which crypto should I put my pension into?"
      ]
    },
    {
      "name": "medical_diagnosis",
      "definition": "Diagnosing medical conditions or recommending treatments.",
      "examples": [
        "What's wrong with me if I have these symptoms?",
        "What medication should I take?"
      ]
    }
  ]
}

The agent sees an attempt to discuss a denied topic and either refuses politely (configurable response) or routes to a human. For regulated industries — banking, healthcare, legal — denied topics are often the regulatory floor. This agent does not give investment advice / medical diagnosis / legal opinions is a defensible posture, and one a board will sign off without a meeting.

Content filters

Categorical filters cover hate, violence, sexual content, misconduct, insults — each with a threshold of NONE, LOW, MEDIUM, or HIGH. HIGH blocks aggressively; NONE disables the category. For most enterprise workloads, MEDIUM on hate, violence, sexual content, and misconduct, with HIGH on the prompt-injection filter (covered below) is the working baseline. Tune from there based on the false-positive rate the use case can tolerate.

Sensitive information filters

PII detection and redaction works per-type, with per-type actions. Detected types include name, email, phone, SSN, credit card, IP address, address, age, US passport, IBAN, and many regional identifiers. For each type, the action is BLOCK (refuse the request), ANONYMIZE (redact before processing), or OBSERVE (log but do not act).

The defensible defaults for a Nigerian banking agent:

PII type	Input action	Output action
Name	OBSERVE	OBSERVE
Email	OBSERVE	BLOCK (prevent leak)
Phone number	OBSERVE	BLOCK
NDPA-class personal identifiers (NIN, BVN)	BLOCK (regex via custom pattern)	BLOCK
Credit card	BLOCK	BLOCK
Bank account	BLOCK	BLOCK

Custom regex patterns extend the default detectors. For Nigerian-context workloads, BVN (11-digit pattern) and NIN (11-digit pattern with checksum) deserve their own custom filters because the default detector list is US/EU-centric. Without the custom patterns, the PII filter is a false sense of security — it catches an American SSN but misses the BVN that actually matters under NDPA.

Word filters

Blocked terms or patterns — useful for brand-specific blocking such as competitor names or banned phrases, or for preventing the agent from saying things internal policy forbids.

Contextual grounding

For RAG workloads, contextual grounding checks that the model's output is actually supported by the retrieved context. The check operates on two dimensions: grounding (does the output follow from the retrieved chunks) and relevance (is the output relevant to the user's question). Configurable thresholds; below the threshold, the response is blocked or flagged. For the RAG pipelines from Part 2, contextual grounding is the layer that catches the hallucination class of failures. An answer with no grounding score is, structurally, an unsupported answer — and an unsupported answer is the kind that ends up in a regulator's letter.

Automated reasoning checks

The newest Guardrail category, introduced in 2024-25. Formal-logic-based validation that the model's output is consistent with a declared policy expressed in a structured logic language. Useful for high-stakes workloads where the agent's output must satisfy specific rules — contract terms, compliance constraints, mathematical correctness in financial calculations. Operationally complex to configure; high value where it fits. Start without automated reasoning and add it only when the workload demands it — typically legal-document review, financial calculation agents, regulatory compliance assistants.

Prompt attack / jailbreak filter

A built-in filter for known jailbreak patterns — DAN-style attacks, instruction injection, role-play bypass attempts. Set to HIGH for production by default. The false-positive rate is low enough that the rare blocked legitimate query is an acceptable trade-off for covering the entire jailbreak attack surface.

How to apply Guardrails — by reference, not by inline

Per the architectural decision from Part 1, Guardrails should be referenced by ID, not inlined per agent. The Guardrail then carries its own ARN and version history. Multiple agents share the same Guardrail. Policy updates propagate immediately to every consuming agent without redeployment. And version pinning lets you roll back a bad policy change without touching agent code. The Guardrail becomes a versioned shared library, not a snippet copied into each consumer.

# Agent configuration referencing a Guardrail by ID
agent_config = {
    "agentName": "customer-support-agent",
    "foundationModel": "anthropic.claude-sonnet-4-6-20251022",
    "guardrailConfiguration": {
        "guardrailIdentifier": "arn:aws:bedrock:us-east-1:123:guardrail/GR-001",
        "guardrailVersion": "DRAFT",  # or specific version number
    },
    # ...
}

Layer 3 — Catching the moment of invocation

Standard CloudTrail captures management-plane events — who created the agent, who updated the guardrail — but does not capture data-plane events such as each InvokeModel call unless explicitly enabled. For Bedrock workloads, data events are what makes invocation-level audit possible. Without them, you have a video camera that only records arrivals at reception, never anything inside the building.

Enable CloudTrail data events for Bedrock:

resource "aws_cloudtrail" "bedrock_data_events" {
  name           = "bedrock-data-events"
  s3_bucket_name = aws_s3_bucket.audit_bucket.id

  advanced_event_selector {
    name = "Bedrock invocations"

    field_selector {
      field  = "eventCategory"
      equals = ["Data"]
    }

    field_selector {
      field  = "resources.type"
      equals = ["AWS::Bedrock::AgentAlias", "AWS::Bedrock::Model"]
    }
  }
}

This produces one CloudTrail record per InvokeModel and InvokeAgent call — caller identity, model ID, request and response sizes, IAM role used. The record does not include the prompt or response contents. Those go to model invocation logs at the next layer. The trail records that an invocation happened, by whom, with what scope.

CloudTrail data events are how you answer which user invoked which model, when, under what role for a compliance audit. They are also how you detect someone is invoking an expensive model in a region they should not be, which is the cost-leak detector hiding inside the security audit.

Layer 4 — Logging what the model actually said

Bedrock model invocation logging captures the actual request and response — the prompt sent to the model, the completion returned, the embeddings produced, the Guardrails trace. This is the layer that makes debugging possible after the fact and that auditors want to see for high-stakes workloads. CloudTrail data events tell you that a call happened; the invocation log tells you what was said.

Configure once per region:

resource "aws_bedrock_model_invocation_logging_configuration" "main" {
  logging_config {
    s3_config {
      bucket_name = aws_s3_bucket.bedrock_invocation_logs.id
      key_prefix  = "invocation-logs/"
    }

    cloudwatch_config {
      log_group_name = "/aws/bedrock/invocations"
    }

    embedding_data_delivery_enabled = false  # large; enable only if needed
    image_data_delivery_enabled     = true
    text_data_delivery_enabled      = true
    video_data_delivery_enabled     = false
  }
}

Two operational rules ride alongside the configuration. The invocation log bucket is the most sensitive bucket in the deployment — prompts include the user's questions (often PII), responses can leak regenerated training data and reconstructed PII, the Guardrails trace shows what the safety system caught. KMS CMK encryption, Object Lock for retention, strict bucket policy, access only for a small set of audit and IR roles. And the retention policy aligns with the regulator that asked for it. NDPA, GDPR, and NIS2 each have specific retention requirements for processing records; CBN CSAT has its own. Object Lock in Compliance mode locks the retention duration so even root cannot delete.

The invocation log is what lets you answer what did the agent actually say to the user about their balance months later. Without it, the agent is a black box even to the team that built it.

Layer 5 — Metrics that make the system legible

Six metrics carry most of the operational signal. Per-agent invocation count — BedrockAgentInvocations with AgentId and AgentAlias dimensions, alarmed on unusual spikes. Per-model latency — ModelInvocationLatency per model ID, where a slow Claude call points either to model-side issues or to a large context window. Per-model error rate — ModelInvocationErrors per model ID and error type, because throttling, validation, and context-too-long all need different responses. Guardrail trip rate — GuardrailIntervention count by guardrail and category, where a spike is either an attack campaign or a content-filter tuning problem. Token consumption — InputTokens and OutputTokens per agent, the direct feed into the cost dashboard from Part 7. And tool-call distribution — per action group, call count and error rate, which surfaces the under-used or over-used tools that need design changes.

X-Ray traces tie the user request to every downstream call. Enable X-Ray on the API Gateway, Lambda, agent, and Step Functions surfaces, and the trace shows the request flow end-to-end. For multi-step workflows from Part 5, X-Ray shows which workflow steps dominated latency or cost — actionable signal for tuning, not just colourful flame graphs.

The observability layer for AgentCore-hosted agents carries additional first-class instrumentation: per-session traces, per-tool execution timings, OpenTelemetry-compatible exports. For long-running agents, AgentCore's observability is materially better than what bolt-on instrumentation gives you on Lambda or EKS.

How the configuration meets the regulators

The security and observability configuration above maps cleanly to the obligations the operator faces:

Obligation	Where the config delivers
NDPA 2023 — Security of processing (Section 39)	KMS CMK encryption, IAM least-privilege, VPC endpoints, model invocation logs as the processing record
NDPA — Breach response (Section 40)	CloudTrail data events + invocation logs answer "what data was processed, by which model, when" within hours of incident notification
NIS2 Article 21 — Incident handling	X-Ray traces + CloudWatch metrics + invocation logs provide the incident-investigation evidence base
NIS2 Article 21 — Supply-chain security	Custom Model Import provenance, Knowledge Base source attribution, action-group Lambda dependency tracking
CBN CSAT — Application security	Guardrails as the input/output control; CloudTrail data events as the access audit
CBN CSAT — Cryptography	KMS CMK on every encrypted resource; key rotation per regulatory cadence
NCC Cyber Resilience Framework — Continuous monitoring	CloudWatch metrics + alarms on Guardrail trips, error rates, anomalous invocations
Pre-IPO disclosure (SEC cybersecurity governance)	Quarterly review of Guardrail interventions, anomalous invocations, and material incident assessment — invocation logs provide the evidence

For each row, the configuration produces an answer in minutes, not days, when the examiner asks.

The five findings that recur

Five things show up in production audits, again and again.

Guardrails configured but not actually invoked. The agent has a guardrailIdentifier field set, but the version is DRAFT and the draft is empty. Always check the production version is the configured one. The trap is that the field is filled in, so a casual review reports green.

Model invocation logging disabled. A team turns it off for cost reasons and then has no answer when a regulator asks what the agent actually said. Cost the storage; do not disable the log.

CloudTrail data events not enabled. Standard CloudTrail captures only management events. The data-events configuration is the explicit opt-in for invocation auditing — without it, the audit trail starts at agent created and goes blank.

Wildcards in IAM. bedrock:* on a production agent role is a finding every single time. Scope to specific actions and specific model or agent ARNs.

Audit logs in the same account as the workload. A compromised account can delete its own logs. Stream CloudTrail and invocation logs to a separate Security OU account where the workload account has no delete permission. It is the difference between a CCTV recorder bolted to the inside of the shop window and one in the police station down the road.

The same Guardrails wrap every tier

Guardrails wrap every Bedrock invocation regardless of model tier. The Haiku router, the Sonnet reasoning, the Opus synthesis, the custom-tuned Llama for narrow extraction — each invocation passes through the same policy. Observability is unified: invocation logs from every model land in the same S3 bucket, CloudWatch metrics tag every invocation with the model ID, X-Ray traces span the full multi-model topology.

For the cost dashboards in Part 7, this unification is what makes per-model spend attribution possible. For the case study in Part 8, it is what makes the SRE agent's actions defensible — every model call, every tool invocation, every Guardrails intervention is in the audit trail before the agent's decision becomes an action.

If your AI invocation logs were subpoenaed tomorrow, would the bucket still exist, and would the trail tell the story you remember telling?

FAQs

Why reference Guardrails by ID rather than inline them per agent?

Inline Guardrails couple policy to deployment — every policy change requires redeploying every agent. Referenced Guardrails have their own ARN and version history; multiple agents share the same policy, updates propagate immediately, and version pinning lets you roll back a bad policy change without touching agent code.

What is the difference between CloudTrail data events and model invocation logging?

CloudTrail data events record that an invocation happened — caller identity, model ID, request and response sizes, IAM role — but not the contents. Model invocation logging captures the actual prompt, completion, embeddings, and guardrail trace. Data events answer "who invoked what"; invocation logs answer "what did the model actually say."

Do default PII detectors handle Nigerian identifiers like BVN and NIN?

No. The default Bedrock PII detector list is US/EU-centric and does not natively recognise BVN (11-digit) or NIN (11-digit with checksum). For NDPA-scoped workloads, add custom regex patterns for both and set the action to BLOCK on input and output. Without this, the PII filter is a false sense of security.

What threshold should the prompt injection / jailbreak filter run at in production?

HIGH. The false-positive rate is low enough that the rare blocked legitimate query is an acceptable trade-off for covering the entire jailbreak attack surface. Lowering the threshold to satisfy a small set of edge-case queries opens the agent to known attack patterns the filter was specifically built to catch.

Why must audit logs live in a separate account from the workload?

A compromised account can delete its own logs. Streaming CloudTrail and invocation logs to a Security OU account where the workload account has no delete permission means the audit trail survives the breach. This is also what makes the logs admissible as evidence — they cannot be tampered with by the principal under investigation.

What's next

Part 7 takes the observability data this piece installs and turns it into the cost discipline that makes production AI economically defensible: per-tier spend attribution, the cascade routing pattern in depth, prompt and response caching, batch inference, provisioned throughput vs on-demand decisions.

The full series:

Part 1 — Foundations: Building AI Agents on Amazon Bedrock
Part 2 — RAG with Bedrock Knowledge Bases
Part 3 — Open-source Agent Frameworks on Bedrock
Part 4 — Model Customization on Amazon Bedrock
Part 5 — Multi-step AI Workflows with Step Functions and Bedrock
Part 6 — Security Guardrails and Observability for Bedrock (this piece)
Part 7 — Cost Optimization on Bedrock (deepest multi-model routing)
Part 8 — Case Study: An SRE AI Agent on Bedrock for CloudWatch Log Triage

The security and observability substrate from this piece is assumed by every subsequent and prior piece in the series. The case study in Part 8 demonstrates the full stack in operation against a real workload.