From Bedrock Posture to Agent Posture: Securing the Layer Between the Model and Your Systems

Key takeaways

Bedrock-layer security reviews cover the model. They do not cover what the agent does when it picks up the model output and calls your CRM, email service, and customer database.
Agentic systems introduce five new attack surfaces: tool poisoning, permission creep across tools, cross-tool data leaks, audit-trail gaps, and indirect injection through retrieved content.
The agent-layer posture rests on five controls: per-tool capability declarations, tenant-scoped credentials inside the MCP server, sandboxed execution, cross-tool audit correlation, and action approval gates for high-impact actions.
Compliance regimes (NDPA, GDPR, CBN CSAT, HIPAA, PCI DSS) do not distinguish between "the model processed this" and "the agent processed this." The cross-tool trace ID is what makes audit answerable.
The next year of AI security incidents will not be model training-data leaks. They will be agents taking unexpected actions across tool chains no audit trail reconstructed cleanly.

The signed review that no longer covers the system

A security team gathers on a Tuesday afternoon to sign off on a Bedrock workload. The IAM policies point at exactly the model ARNs the application uses. VPC endpoints carry inference traffic off the public internet entirely. Customer-managed KMS keys encrypt every bucket in the pipeline. CloudTrail catches every invocation. Guardrails screen for PII and refuse the topics the policy team listed. Production-ready, the document says. Everyone signs.

Two weeks later, a developer demos the next iteration. The same model the team blessed now reaches an MCP server that reads the CRM. Then another that sends Gmail. Then a third that writes back to the customer database. A salesperson types follow up with overdue accounts, and the agent reads, drafts, and sends. It works beautifully. It also lives entirely outside the boundary the security review covered.

This is the gap the model-layer piece opens but does not close. The IAM, VPC, KMS, logging, and Guardrails stack is the foundation a production AI workload needs. It stops being the whole story the moment the model stops returning text and starts triggering actions on real systems. The thing you are defending shifts from what the model can see to what the agent can do. New attack surface. New audit trail. New compliance questions, asked in the same language as the old ones but answered with completely different evidence.

When the model grows hands

Side-by-side view — the model-posture controls (IAM scoping, VPC endpoint, KMS, logging, Guardrails) on the left extend into the agent-posture controls on the right (tool capabilities, tool-held credentials, cross-tool isolation, cross-tool audit, action approval gates). — Figure 1 — Model posture says what the model can see; agent posture says what the agent can do.

A chatbot fails by saying the wrong thing. An agent fails by doing the wrong thing. The difference reads like the gap between a heckler and a hand on the steering wheel. The OpenClaw architecture piece walks through how MCP, RAG, and orchestration combine to let an agent reach email, CRM, document stores, and databases. Each tool added to the manifest is a new lever the agent can pull, and the model-layer review does not see most of those levers at all.

Five new attack surfaces show up in agentic systems. None of these are theoretical — every one has surfaced during production reviews of pipelines we did not build.

The first is tool poisoning. An MCP server returns adversarial data and the agent acts on it. The server might be one your team operates, compromised through a supply-chain vulnerability hiding in its dependencies. It might be a third-party integration you trusted on a vendor's reputation. It might be a perfectly honest server returning a document whose body contains injected instructions. The Bedrock layer cannot tell the difference between the email body the agent retrieved and instructions the agent should follow — to the context window they are both text. The defence has to live where the text comes from, not where the model reads it.

Then there is permission creep across tools. An agent with email access alone is a bounded thing. The same agent with email and CRM access becomes a different threat model entirely. Think of two ingredients that are safe in isolation but explosive together. An injection that bounces harmlessly off either tool alone becomes a live exploit when the agent chains them: read a customer record, paste the record into an email, send it to an attacker's inbox. Every individual step is a legitimate use of a tool the agent is allowed to call. The exploit lives in the composition, not the components.

Cross-tool data leaks follow the same logic. Sensitive data read from one tool flows into another, gets written into a document, gets sent in an email, gets logged to a system that was never designed to hold it. The agent has no concept of data classification. The MCP servers do not know what the others have touched. Unless an explicit boundary enforces the rule, data moves through the toolchain the way water moves through a house with no doors.

Audit trail gaps are subtler. CloudTrail shows the Bedrock invocation. The model invocation log shows the prompt and the response. Neither shows which downstream system the agent reached, with what parameters, returning what. The audit team asks who accessed this customer record at 14:32? and the technically correct answer is the Lambda function that runs the agent. Correct. Useless. And not what the regulator wanted to hear.

The fifth surface is indirect injection through retrieved content. A user uploads a PDF. The agent reads it. Embedded somewhere in the document is text telling the agent to send the user's contact list to an external address. The Bedrock Guardrail does not see this, because Guardrails screen the output and the input prompt, not the meaning the model takes away from an attachment. The companion piece on prompt-injection defences walks through this category in detail. The point in this context is that the injection only matters because the agent has tools to act on it.

Five controls that close the gap

Defending an agent is not defending a model with more vigilance. It is a different architectural posture, organised around five controls.

Start with per-tool capability declarations. Every MCP server registered with an agent declares what it can do and which data it can touch. Not in a wiki — as enforced metadata the orchestration layer reads at request time. The Gmail server announces: reads inbox, drafts and sends mail, attaches files only from approved sources. The CRM server declares: reads customer records scoped by tenant, writes activity logs, cannot delete. The agent's tool manifest is the union of these declarations, and the orchestrator refuses any call falling outside its declared envelope. The model can request anything it likes. The orchestrator decides which requests run.

Next, push tenant-scoped credentials into the MCP server itself. The agent does not hold credentials. The server does. When the agent calls send_email it passes a tenant context, not an API key, and the server reaches for the credentials it holds for that tenant. Two reasons matter here. The agent's prompt and tool-call surface flow through systems that are auditable but were never designed as secret stores — keeping credentials out of that surface eliminates a leak path. And it makes tenant isolation enforceable at the server, where the rule can be policed: tenant B's credentials cannot be used in a request bound to tenant A, no matter what the agent tried.

Then comes sandboxed execution. A tool call cannot affect anything outside its declared capability. The file-system MCP server runs in a sandbox limited to the directories the tenant approved. The database server runs as a user with row-level security applied. The shell server — if you have one, and you probably should not — runs in a container with no network and no filesystem persistence. Build for the day the agent attempts something it should not. The blast radius is whatever the sandbox allows, not whatever the agent's good behaviour allows.

Cross-tool audit correlation is the fourth control. Every model invocation gets a trace ID. That ID propagates to every tool call the agent makes in response, every downstream API call those tools trigger, every data record they touch. Show me everything that happened under trace abc-123 becomes a single query, and the answer is the full chain in order, with timing, results, and the data each step accessed. CloudTrail does not give you this. You build it — a tracing header on the agent runtime, propagation in every MCP server, structured log lines keyed by trace ID across every system the agent reaches.

The fifth control is action approval gates. High-impact actions — external email, customer-record changes, document deletions, anything that moves money — do not execute when the model emits the tool call. They queue with full context attached and route to an approval surface. The approver might be a human, or a policy engine evaluating the action against rules the client controls, or both depending on the impact tier. The dedicated piece on approval gates covers the pattern in detail. The short version is that approval is not a UX nicety; it is the firewall between the agent decided to do something and the something happened. For any action with real-world consequences, that firewall is the line between a feature and a liability.

The compliance map, redrawn

The regimes the model-layer piece mapped do not draw a line between the model processed this and the agent processed this. Both are processing. Both fall under the same data-handling obligations. The compliance team will ask both sets of questions.

The NDPA in Nigeria expects lawful basis, purpose limitation, and the ability to honour subject access requests. An agent that reads, modifies, and writes customer data across systems is processing on the same terms the model is — and the audit trail must show what the agent then did with the output, not just that the model was invoked. The cross-tool trace ID is what makes that question answerable.

GDPR in the EU treats automated decision-making with downstream effects as a category that requires human review under Article 22. An agent making consequential decisions without an approval gate falls inside the scope of that article. The gate is not just a security control. It is a compliance control too.

CBN CSAT in Nigerian banking is explicit about third-party processing risk. An MCP server is third-party processing — even when your team wrote the server — because it is a separate system handling regulated data on the bank's behalf. The control set that applies to a payments vendor applies here. Most CSAT-aligned reviews of agent systems we see have their gaps in this category.

HIPAA in US healthcare treats every system touching PHI as in scope. An agent that pulls a patient record into a context window and writes a summary to a document has touched PHI in two systems and through one intermediary. Each one needs the BAA, the audit log, the access control mapping. The agent is not exempt because it is just an AI.

PCI DSS still has no AI-specific guidance, but the cardholder-data scoping rules are unambiguous. Any system that stores, processes, or transmits CHD is in scope. An agent's context window is a system. Keep CHD out of it.

The mapping exercise is the same one we run at the model layer. The difference is that the agent layer adds controls the model-layer audit never needed. Most teams have not done that work, because the patterns are newer and the auditor checklists have not caught up. The compliance question is arriving anyway.

The year that decides this

The model-layer posture is now well-understood. The reference architectures exist. The IAM policies are nearly standard. The auditors recognise the controls when they see them, even when teams have not deployed them yet.

The agent-layer posture is not. The patterns are emerging from the teams shipping agentic systems in production and learning what breaks. The five controls above are the ones we deploy, but they are not the only valid set. What is consistent everywhere is the size of the gap, and the certainty that the gap is where the first generation of public AI-agent incidents will come from. The headline a year from now will not be a model leaking its training data. It will be an agent that did something the team did not expect, across a chain of tools no audit trail reconstructed cleanly.

If you are running an AI workload on Bedrock today, the model-layer review is the start, not the end. The next review covers the tools the agent calls, the credentials those tools hold, the boundaries between them, the trace IDs that connect them, and the gate that sits in front of every consequential action. That review decides whether the agent is a production capability or a quiet liability waiting for the first time a user uploads the wrong PDF.

If the model was the puzzle, the agent is the city you built around it — what happens next depends on how you wired the streets.

FAQs

Does Bedrock Guardrails cover prompt injection inside retrieved documents?

No. Guardrails screen prompts and outputs, not the meaning the model derives from input documents. A PDF that contains injected instructions can drive the agent to act outside policy, and Guardrails will not see it because the text was legitimately retrieved. Defending against indirect injection is an agent-layer control, not a model-layer one.

Where should agent credentials live — in the agent or in the MCP server?

In the MCP server, scoped per tenant. The agent passes a tenant context; the server uses the credentials it holds for that tenant. This removes credentials from the agent's prompt and tool-call surface (which flows through auditable but non-secret-store systems) and makes tenant isolation enforceable at the server, regardless of what the agent attempted.

What does an action approval gate actually look like in production?

High-impact tool calls — sending external email, modifying customer records, moving money — do not execute when the model emits them. They queue with full context attached and route to either a human approver or a policy engine that evaluates the action against client-controlled rules. The gate is the architectural firewall between "the agent decided to do something" and "the something happened."

How do we connect a Bedrock invocation to the downstream tools the agent then called?

You build it: a trace ID generated at the agent runtime, propagated as a header into every MCP server, and emitted on structured log lines from every downstream system the agent reaches. CloudTrail alone does not give you this. The audit query "show me everything that happened under trace abc-123" is what makes the regulator's question answerable.

Does an MCP server count as third-party processing under CBN CSAT, even if we built it ourselves?

Yes. CSAT addresses third-party processing risk by behaviour, not by ownership. An MCP server is a separate system handling regulated data on the bank's behalf, so the control set that applies to a payments vendor applies to your MCP layer. Most CSAT-aligned reviews of agent systems we see have their gaps in this category.