Glue Code Is the Job: Notes from Enterprise LLM Integration

It was the second Friday of an enterprise AI rollout when the engineer noticed the SOAP service in the diagram. She had been three weeks into the project. The model worked. The retrieval worked. The agent answered questions with grounded citations and a polite refusal when it should not. Then the auth lead opened the architecture review with a slide nobody had seen before — a faded box labelled role-mapping-svc, with a connector running back to an Oracle Forms screen-scrape that the previous platform team had built in 2009 and which still held the only authoritative mapping between employee number and the permissions any downstream system trusted.

There is a sentence the engineering recruitment market has been quietly avoiding. The sentence is: the model is the easy part.

It is unfashionable because the entire AI hiring discourse — Forward Deployed Engineer roles at Anthropic and OpenAI, Applied AI Engineer roles at Scale, Solutions Engineer roles at Databricks, every Staff-grade title with "AI" in the name — pretends the job is the technical thing on the marketing page. The model. The prompt. The agent loop.

It is not. The job is the wiring. The integration into the SSO. The audit trail formatted for the regulator's auditing tool. The secrets rotation that does not break the eval suite at 03:00. The change-data-capture stream that backfills the vector store every twelve hours without taking down the source database. The broken SOAP service the auth team forgot existed but the agent needs to call.

This work has a name. It is called glue code — a term that has carried mild contempt for thirty years in engineering circles, and that has quietly become the single most valuable skill on the market in 2026.

Key takeaways

A modern enterprise AI system has between fifteen and forty distinct integration surfaces; the model lives in one of them, and the other thirty-nine are the difference between a working system and a ZIP file on someone's laptop.
Identity is the surface every other integration inherits from — the agent's auth has to derive from the human's auth, with the original principal preserved in the audit trail and the agent's permissions as a strict subset.
The audit trail is not a Datadog dashboard but a structured event stream to an object-locked bucket with KMS-enforced retention, round-tripping back into the eval pipeline and feeding the compliance team's existing tooling.
Cost telemetry is the seatbelt — invisible until it is the only thing that matters; the first unexpected model bill puts the project on probation, the second kills it.
The engineer who can survive this work — senior enough to know SOAP, SAML, X.509, and Oracle Forms are all still in production, and willing to write the ten-line glue script that lasts three years — is the rarest profile on the 2026 market.

The embarrassing scripts at the edges

For most of software history, glue code was the small, embarrassing scripts at the edges of a system — the bits you wrote because the database driver did not quite return what the API contract said it would. The work was real, but the cultural framing was that real engineers built libraries and frameworks. Glue code was what you did at the end of a sprint when there was nothing else to do.

The economics of LLM deployment inverted this completely. The frame that fits better now is plumbing in a city — invisible when it works, expensive when it does not, and the difference between a building that is habitable and one that is just a permit on someone's desk.

A modern enterprise AI system has, at a rough count, between fifteen and forty distinct integration surfaces. Each one is a glue point. Each one carries its own auth flow, its own failure mode, its own audit obligation, its own legacy quirks. The model lives in one of those surfaces. The other thirty-nine are the difference between a working production system and a ZIP file on someone's laptop.

What follows is a non-exhaustive tour, in the order each surface tends to show up in the project.

The surface every other surface inherits from

The agent has to act as someone. That someone is in the enterprise's IAM directory, which in most regulated institutions is one of three things. IAM Identity Center on top of Active Directory. Okta or Azure AD with custom SAML to legacy systems. Or a homegrown LDAP that predates anyone still on the team.

The agent's auth token has to be derived from the human's auth token, on every call, with the original principal preserved in the audit trail. SAML assertions in flight, JWT scoping, role-chain assumption, and the agent's permissions modeled as a strict subset of the human's — never a superset, ever, under any circumstance. Think of it as a clerk acting on behalf of a customer at a bank — the clerk can do what the customer can do, no more, and the customer's signature is the one that ends up on the slip.

We laid out the architecture in Designing Strict RBAC for Enterprise Knowledge Bases, but the deeper truth is that every other integration in the system inherits from this one. Get it wrong and nothing downstream is salvageable.

Where the corpus actually lives

The corpus is rarely a clean directory of PDFs. In the real institutions where AI is meant to land, it is a SharePoint instance with three years of orphaned permissions. An IBM DB2 database where the schema is a thirty-year-old artefact of someone's RPG report layout. An Oracle Forms application with screen-scrape as the only documented access pattern. A monthly Excel drop into S3 that the compliance team treats as the canonical record. A SOAP service the auth team forgot existed but which holds the only authoritative role mapping.

Each of these needs a structured ingest pipeline — incremental, idempotent, restartable, and quiet enough to not page the on-call. The pipeline is glue code. There is no library that does it for you because the shape of each enterprise's mess is unique. It is the difference between writing a translation between two languages and writing a translation between this specific dialect of one language and a private vocabulary nobody outside the institution has ever heard spoken.

The forensic record, not the pretty graph

Every model invocation must produce a forensic record that an auditor — internal or regulator — can read in six months and reconstruct exactly what happened. Prompt, response, model version, retrieval evidence, user principal, time, and any tool calls the agent executed.

This is not a Datadog dashboard. A dashboard is a window into the present. A forensic record is the evidence file the detective reads when the building burned down two quarters ago. It is a structured event stream, written to an object-locked bucket, with retention policies enforced by KMS keys nobody on the engineering team has access to. The events have to round-trip back into the eval pipeline so regressions get caught against real production traffic. They have to feed the compliance team's existing reporting tooling without requiring that team to learn a new vendor.

The infrastructure for this is described in The Blueprint for Air-Gapped LLM Deployments on AWS Bedrock — CloudTrail data events on every Bedrock invocation, model invocation logs to a dedicated audit bucket. But the architecture is the easy part. The glue is fitting it into the institution's existing audit workflows without requiring six committees' approval.

The four-clock secrets problem

The agent has secrets — API keys to internal services, KMS key IDs, SSM parameter paths, OAuth client credentials. These rotate. Sometimes automatically. Sometimes on a security incident. Sometimes because some unrelated team rotated a shared key and nobody told us.

The rotation flow has to be picked up by the agent on the next invocation without restart. It must not invalidate any in-flight conversation state. It must not break the eval suite that was running at the time of the rotation. And it must be logged, so when something does go wrong, the timing of the rotation correlates with the failure signal.

There is a Kubernetes-native answer for this, an AWS-native answer for this, and a homegrown-Vault answer for this. There is also the institution's existing answer for this, which is usually a fourth thing and which is non-negotiable. The glue is making the agent operate inside that fourth thing without rewriting the institution's secrets posture. It is the engineering equivalent of changing the lock on a door while the staff are still walking through it.

When the agent has to ask for help

When the agent hits an edge case — low confidence on a retrieval, an action that requires elevated permission, a prompt that triggers a guardrail — it has to get out of the way. The escalation goes to a human. Which human? Through which queue? With which context payload? Reaching them how — Slack, Teams, an internal ticketing system, an email?

This is glue code. It is also the single most important determinant of whether the deployment survives the first quarter of production use. We covered the pattern in agent action approval gates but the harder problem is the wiring into the institution's existing ticket flow — which is some version of ServiceNow or Jira with a custom workflow nobody documented in 2017. Like an air-traffic controller handing off a plane to the next sector — the conversation has to be terse, structured, and the receiving controller has to know exactly where the plane is and where it is going before the radio call ends.

The seatbelt that is invisible until it matters

Token spend, model-tier distribution, cache hit rates, prompt-length percentiles, per-tenant spend. All of it has to feed back to the finance team in the format they recognise — usually monthly, usually tied to a specific cost centre, usually with a budget alert wired to the same Slack channel that pages the on-call.

The first time the finance team sees a model bill they did not expect, the project is on probation. The second time, the project is dead. Cost telemetry is the seatbelt — invisible right up until it is the only thing that matters.

The architecture for the model-tier cascade and prompt caching is in Cost Optimization on Amazon Bedrock, but the visibility layer — the part the CFO sees — is glue code, written specifically for the institution's existing finance reporting cadence.

The profile this work selects for

Read those six integration surfaces again. None of them are model engineering. None of them are AI research. None of them appear on the slides at a major conference. All of them are the actual job.

The engineer who can survive this work shares a specific profile. Senior enough to have lived through legacy integration before — knows that SOAP and SAML and X.509 and an Oracle Forms screen-scrape are all real and all still in production at most institutions worth deploying to. Has built and shipped end-to-end systems, not just prototypes — has seen what happens when a vector store ingest fails at 02:00 on a public holiday. Can hold a context-shifting conversation with a regulator's auditor in the morning, a CISO in the afternoon, and a junior backend engineer in the evening, without losing the institutional politics of any of the three. Reads logs the way some people read literary criticism — alert to the unstated, watching for what should be there and is not. Does not need the dignity of complex problems. Will write the ten-line glue script that integrates the legacy ticket system, ship it, and move on. The script will still be running in three years.

This profile has a market name now — Forward Deployed Engineer, Applied AI Engineer, Customer-Facing Solutions Architect — but the work is older than those labels. It is the work of any engineer who has shipped serious software into a serious institution. The reason it has become the most-bid-on profile in 2026 is straightforward. The model is, finally, good enough that the integration is the project. The bottleneck moved.

Two ways to read this

There are two ways to read this piece.

The first is as career advice. Glue code is no longer the low-prestige work it was framed as for thirty years. The engineers who lean into the integration surface, the auth flows, the audit trails, the secrets rotation, the legacy database extraction — those are the engineers companies are paying staff-tier compensation for in 2026.

The second is as architecture advice. If you are scoping an enterprise AI deployment and the technical conversation in the room is about the model and the prompt, you are scoping the wrong thing. The technical conversation should be about the six surfaces above. The model is a procurement decision. The integration is the project. If the integration is not on the architecture diagram, the project is in the ninety-five per cent. We wrote about why ninety-five percent of enterprise AI pilots fail at the deployment phase — this piece is the granular version of the same diagnosis, told from the perspective of the engineer who has to do the work.

The faded box in the architecture review was the project. The model was a procurement note. Which one is on your diagram?

FAQs

Why has glue code suddenly become high-prestige work?

Because the model is finally good enough that the integration is the project. The bottleneck moved. For thirty years, glue code was framed as low-prestige work — the embarrassing scripts at the edges of a system. The economics of LLM deployment inverted this completely, and the engineers who lean into auth flows, audit trails, secrets rotation, and legacy database extraction are the ones companies are paying staff-tier compensation for in 2026.

What does "the agent's permissions are a strict subset of the human's" mean in practice?

The agent acts on behalf of an authenticated human, and on every call the agent's permissions are bounded by what that human is authorised to do — never a superset, ever, under any circumstance. SAML assertions in flight, JWT scoping, role-chain assumption — the architecture has to make it structurally impossible for the agent to do anything the human could not do directly.

Why won't a Datadog dashboard satisfy the audit trail requirement?

Because the audit trail has to be a forensic record an auditor can read in six months and reconstruct exactly what happened. That means a structured event stream — prompt, response, model version, retrieval evidence, user principal, time, tool calls — written to an object-locked bucket with retention policies enforced by KMS keys the engineering team does not control. It also has to round-trip into the eval pipeline and feed the compliance team's existing reporting tooling.

How do you avoid the agent breaking when a secret rotates?

The rotation flow must be picked up on the next invocation without restart, must not invalidate in-flight conversation state, must not break the eval suite running at the time, and must be logged so failure correlation works later. The institution usually has a specific secrets posture (often a fourth answer beyond Kubernetes-native, AWS-native, or Vault) that is non-negotiable — the glue is making the agent operate inside that posture without rewriting it.

What kind of engineer profile actually survives this work?

Senior enough to know SOAP, SAML, X.509, and Oracle Forms screen-scrapes are still in production at most institutions worth deploying to. Has shipped end-to-end systems, not just prototypes — has seen what happens when a vector store ingest fails at 02:00 on a public holiday. Can hold a context-shifting conversation with a regulator's auditor in the morning, a CISO in the afternoon, and a junior engineer in the evening. Reads logs like literary criticism — alert to what should be there and is not.

How to engage

We build the glue. PrivateLink and KMS architectures, IAM Identity Center wiring, audit-grade observability, change-data-capture pipelines into vector stores, the awkward bits where the agent has to talk to a 1998-vintage SOAP service. Talk to us at creativeminds.dev/contact, or fork the compliance-automator reference architecture which bakes most of the six surfaces above into a runnable system.