When the LLM Is the Easy Part: AI for Legacy Banking Stacks

A senior architect at a regional Nigerian bank slid a thick ring binder across the conference table and tapped its cover. "This," he said, "is the documentation for our savings module. The engineer who wrote it died in 2012. The mainframe has been continuously deployed since 1989. The binder is what we have." He was not joking. He was warning us.

Substitute the institution and the date as you like. The conversation is the same in Stockholm, in Dallas, in Manila. The bank has a core. The core is old. The core works. The core cannot break, because if the core breaks, customers cannot withdraw money, and if customers cannot withdraw money the regulator opens an investigation that ends careers. The core therefore carries a stability guarantee the AI strategy has to defer to completely, the way a satellite has to defer to the planet it orbits.

That deference is the constraint that gets written out of most enterprise AI content. The constraint is the actual job.

Key takeaways

The marketing diagram for enterprise AI has three boxes; the real architecture in a bank has twenty to thirty, with eleven layers above the model and most of the work below it.
The retrieval layer is where every architecture review actually stops moving — DB2 on z/OS with EBCDIC and 1991-vintage column constraints, SharePoint with orphaned permissions, an Oracle Forms app with no maintainer since 2018, and the Excel canonical record the compliance team treats as authoritative.
Always build around the compliance team's workflow — never change it. The Excel with merged cells and a "Final FINAL v3" tab is canonical because the compliance team treats it as canonical; the ingestion has to handle it exactly as produced.
The pattern that survives audit: canonical event format, per-source adapters by engineers who understand each source's quirks, central enrichment with ACL propagation, and a vector-store sink that respects the ACL.
The model is a procurement decision — one of two thousand decisions in a project where the other 1,999 are the architecture, integration, and operational discipline. Treating the model as the project is why most enterprise AI deployments fail.

The Three-Box Diagram and the Iceberg Beneath It

The marketing diagram for enterprise AI has three boxes. User. Agent. Knowledge Base. Sometimes a fourth box appears for tools. Whoever drew it has been to a conference recently.

The real diagram has twenty to thirty boxes, and the iceberg is mostly underwater. Most of the boxes sit between the user and the model. Almost none of them are the AI.

Sketched at the level of layers rather than systems, the working version begins at the customer touchpoint — usually a web or mobile app the bank already has, with established analytics, security review cadence, and accessibility audits. From there it passes through an identity translation layer that converts the customer's session into an internal principal the AI system can legitimately act as a subset of. Then a policy enforcement point — the bank's existing rule engine, often homegrown, often older than half the engineers reading this — decides what queries are allowed for what principal. After that comes an intent classification layer, typically a small Haiku-tier model deciding whether the query is one the AI is allowed to answer at all, or one that has to be handed off to a human.

Only after all of that does the retrieval layer come into view. The vector store, with strict RBAC, built from sources that are themselves three layers deep. The model invocation finally arrives — Sonnet for the reasoning, perhaps Opus for the edge cases — and after the model speaks, the response runs through a guardrails layer that filters PII, refuses denied topics, and re-grounds answers in their citations.

After the answer is delivered, the work continues. An audit emission point ships a structured event to an object-locked bucket and into the bank's existing audit ingestion. An escalation surface handles the cases where the agent lacks confidence or permission. A reporting feed loops outcomes into the bank's customer-service metrics and the compliance team's regulatory reporting. A regression-eval ingestion captures production traffic into the evaluation suite under retention controls.

Eleven layers above the model. The real work begins below.

What Lives Underneath the Retrieval Layer

The retrieval layer is where every architecture review we walk into actually stops moving. The vector store needs an embedding source. The embedding source has to be built from the bank's institutional knowledge — account policies, AML guidelines, fee schedules, regulatory filings, internal SOPs, the customer-service knowledge base, last quarter's product-update memos. Those documents live in nine different systems in seventeen different formats, each with its own personality.

The transactional record sits inside DB2 on z/OS. The schema is a thirty-year-old artefact of somebody's RPG report layout from 1991. Column names are six characters long because the terminal was. The same field stores three different semantic meanings depending on the value of an adjacent flag column. The encoding is EBCDIC throughout. Date fields are eight-character YYYYMMDD strings, except in the savings module, where they are five-character YYDDD Julian dates because the savings team had a specific reason in 1994 that nobody can now remember and the binder explains in seventeen pages. Getting a clean text feed out of this requires a change-data-capture stream — IBM CDC or a third-party tool wired to the DB2 logs — translating to UTF-8 on the way out, denormalising the joined tables, and applying the semantic disambiguation the binder describes field by field. The CDC pipeline is the most sensitive integration in the entire AI project. If it falls behind the source database, the AI starts answering questions with stale data. If it falls too far behind, regulators notice before the team does.

The bank's policies, standard operating procedures, and internal memos live in SharePoint, where they have accumulated three years of orphaned permissions, two parallel taxonomies — one from the 2023 reorg, one from the era before it — and a folder optimistically titled "DO NOT EDIT - 2019 LEGAL ARCHIVE" that nobody is sure should be indexed at all. The ingestion has to honour the existing SharePoint permissions, which means the vector store needs metadata reflecting every document's effective ACL at the time of indexing, and the ACL has to be refreshed on every change and propagated through the embedding pipeline. The architecture for this style of RBAC mapping is laid out in Designing Strict RBAC for Enterprise Knowledge Bases.

The savings-product configuration lives inside an Oracle Forms application that has had no maintainer since 2018. The only documented access path is the green-screen Forms UI. There is no public API. The team faces two options. Option one is to screen-scrape the Forms UI from an automated session, parse the layout, extract the configuration — brittle, slow, and the security team will block it before you finish the slide deck. Option two is to ask the database team to write a daily extract to flat files on an SFTP location the AI ingestion can pull from — slower to implement, requires a meeting with people who do not enjoy meetings, but it is the version that survives an audit. Always option two.

There is, finally, always at least one critical reference in an Excel file. In Nigerian banks it is usually the fee schedule for the latest product. In European retail banks it is usually the interest-rate sheet. In North American credit unions it is usually the membership-eligibility matrix. The Excel is canonical because the compliance team treats it as canonical, the way a parish register is canonical because the parish treats it as canonical. The Excel has merged cells, three sheets, a tab called "Final FINAL v3," and a formula referencing a defined name that breaks the moment somebody opens the file in LibreOffice. The ingestion has to handle this exactly as the compliance team produces it, on the monthly cadence they update it, with the manual upload step they expect to perform. We do not change the compliance team's workflow. We build around it.

The Translation Layer Is Most of the Project

Sit with those four sources for a moment. They share no schema. They share no encoding. They share no update cadence. They share no security model. Connecting them to a single AI system is most of the engineering work in the entire project, and the pattern that survives audit has a specific shape.

A canonical event format every ingestion source eventually emits to. A structured envelope with a source identifier, a timestamp, a principal or system of record, a payload, and a hash for idempotency. Per-source adapters — small, focused programs that handle one source each, written by engineers who understand that one source's specific quirks deeply. A central enrichment step that applies semantic disambiguation, propagating ACL metadata from the source to the vector store. A sink that respects the ACL and indexes the enriched payload, re-embedding when source configuration changes invalidate the prior index.

The model never sees the raw DB2 column. It sees the canonical event. The audit trail records every transformation the event passed through, so that when a regulator's question lands six months later, the answer is traceable back to the source row in the source system on the source date. This is the part of enterprise AI that no vendor puts on a slide. It is also the part that, if missing, guarantees the project will fail the security review and slip another quarter.

The Model Is a Procurement Decision

The model is — and this is the unsettling part for anyone whose career identity sits inside model engineering — a procurement decision. One decision among two thousand in the project.

In our practice the default is Claude on Bedrock. Sonnet does the reasoning. Haiku does the routing. Opus appears only at the edges. The full cascade is laid out in Multi-Model AI on Amazon Bedrock. The choice is defensible, reproducible, and supported by an evaluation harness that catches regression on every model-version change. But it is one decision among two thousand. The other one thousand nine hundred and ninety-nine are the architecture above, the integration surfaces from Glue Code Is the Job, and the operational discipline from Mitigating Non-Deterministic AI Failures in Production Systems.

The reason most enterprise AI deployments fail is not that the model was wrong. It is that the architecture treated the model as the project, and the project as the model. The strategy presentation with three boxes and one of them labelled "GPT-4o" or "Claude Sonnet" is unready. The strategy presentation is ready when the diagram has twenty boxes, the model is one of them, and somebody in the room can describe what happens to each of the other nineteen when DB2 falls behind by six minutes during a month-end batch run.

The engineers who can hold that whole picture in their head — who can write a CDC adapter for the savings module on Tuesday, debug a SharePoint ACL propagation bug on Wednesday, and convince the platform team on Thursday that the AI layer will not touch their batch window — are the rarest profile in the market.

That is also why Tier 1 banks now pay senior-staff compensation for this work, and why the procurement officer who walked in expecting to buy a model is walking out having bought an engineering team.

FAQs

Why is the retrieval layer where architecture reviews stop moving?

Because the vector store needs an embedding source, and that source is built from institutional knowledge sitting in nine different systems and seventeen different formats — DB2 with EBCDIC, SharePoint with orphaned permissions, Oracle Forms with no API, Excel with merged cells. Each source has its own schema, encoding, update cadence, and security model. The integration is most of the engineering work in the project, and there is no library that does it for you because the shape of each bank's mess is unique.

Why not change the compliance team's Excel workflow?

Because the Excel is canonical — the compliance team treats it as the authoritative record, and the audit trail depends on that treatment. Changing it requires regulatory approval, parallel reconciliation, and political capital the AI project does not have. The version that survives is the one that ingests the Excel exactly as the compliance team produces it, on the monthly cadence they update it, with the manual upload step they expect. You build around the workflow, not the other way around.

Why is screen-scraping the Oracle Forms app the wrong answer?

Brittle, slow, and the security team will block it. The version that survives an audit is to have the database team write a daily extract to flat files on an SFTP location the AI ingestion can pull from. Slower to implement, requires a meeting with the database team, and depends on getting the right person on a call — but it is the version with documented data lineage, controlled access, and an audit trail that holds up.

What does "the model is a procurement decision" mean for our architecture review?

It means the technical conversation in the architecture review should not be about Sonnet vs Haiku vs Opus, or about the prompt template. Those are decisions, but they are not the project. The conversation should be about the eleven layers above the model and the integration layer below — what happens when DB2 falls behind by six minutes during a month-end batch run, how the SharePoint ACL propagates, how the CDC pipeline restarts after a regional failover. If those questions are not in the room, the architecture is unready.

Why are banks willing to pay senior-staff compensation for this work?

Because the engineers who can hold the whole picture in their head are the rarest profile in the market. The same engineer needs to write a CDC adapter for the savings module on Tuesday, debug SharePoint ACL propagation on Wednesday, and convince the platform team on Thursday that the AI layer will not touch the batch window. That combination — legacy systems literacy, integration discipline, model engineering, and the political skill to navigate the platform team — does not come from a single career track.

Companion content

Why 95% of Enterprise AI Pilots Fail at the Deployment Phase — the strategic version of the diagnosis
Glue Code Is the Job — the engineer's view of the integration work
Designing Strict RBAC for Enterprise Knowledge Bases — IAM-mapped vector store filtering
The Blueprint for Air-Gapped LLM Deployments on AWS Bedrock — the security architecture
Multi-Model AI on Amazon Bedrock — the model cascade pattern
Hardening Applications Before AWS Migration — the specific Nigerian-banking compliance context

How to engage

We do this work — the legacy integration, the CDC pipelines, the SharePoint ACL propagation, the audit-grade translation layer, the AI architecture that sits on top of all of it without breaking the batch window. Talk to us at creativeminds.dev/contact.