DSPM for AI in 2026: What the Wiz Explainer Doesn't Cover About the Retrieval-and-Generation Seam

The CISO and her head of data sit on either side of the procurement deck. On the cover, in tasteful sans serif, are the words "DSPM for AI." Inside is the dashboard: a tidy heat map of training datasets, vector stores, and embedding stores discovered across three clouds, classified, tagged with sensitivity, and ranked by exposure. The numbers are real. The discovery is genuine. The CISO has every reason to sign. Two floors down, in a meeting that nobody invited her to, a different engineer is staring at a confused-deputy retrieval log from a production RAG assistant and reading evidence that a customer was shown a chunk from a different customer's contract. The vector store's RBAC dashboard reads green. The dashboard is not lying. The dashboard simply does not see what happened.

Wiz Academy published its DSPM-for-AI explainer on 23 April 2026. It is one of the cleaner pieces a vendor wrote about the category this year — solid on what an AI-specific data substrate looks like, accurate on why standard DSPM does not reach it, useful as a procurement reference for a CISO building a brief. It is also, like every vendor explainer, scoped to the gates the vendor sells. The retrieval-and-generation seam — where confused-deputy retrieval, embedding-similarity over-share, output-leakage paraphrase, and corpus poisoning through privileged-write paths actually exfiltrate data in production — sits outside the explainer's frame and outside the product's reach. It is the difference between a building inspector who certifies that all the locks are sound and a security review that asks who has the keys.

cmdev covered the seam directly in our earlier DSPM-meets-RAG piece. This article is the explicit Wiz counterpart: a fair read of what the explainer covers, a precise read of where it stops, and the operational architecture a procurement lead needs to know the vendor will not ship on its own. The aim is not to argue that Wiz is wrong. It is to name the gates the product does not cross, so the buying conversation produces a defensible posture rather than an AI-DSPM line item that closes nothing standard DSPM did not already cover.

Key takeaways

Wiz's DSPM-for-AI explainer covers discovery, classification, and posture of the AI data substrate — training datasets, vector stores, embedding stores. Genuine and useful. It stops where the data is at rest or in transit.
The retrieval-and-generation seam — where confused-deputy retrieval, embedding-similarity over-share, output paraphrase, and corpus poisoning actually happen — sits outside the explainer and outside most DSPM-for-AI products in 2026.
The five-gate architecture from the earlier cmdev piece holds, with four 2026 additions: multi-tenant vector partition policy, prompt-version sensitivity tagging, audit-chain integration with the principal-of-record, and cross-model embedding migration risk.
The vendor landscape sorts into discovery-and-classification leaders (Wiz, Cyera, Borneo), pipeline observability (Sentra), regulatory mapping (Securiti), and the build-it-yourself alternative on Bedrock Knowledge Bases plus Lake Formation plus a custom retrieval-time filter. None covers the seam end-to-end.
For most enterprises the answer is a vendor for gates one to three plus a custom build for gates four and five. Paying the AI-DSPM premium for what is structurally CSPM plus a tag is the marker of a bad procurement.

What the Explainer Gets Right

Give Wiz its due. The piece extends the DSPM discovery surface cleanly into the AI substrate — training datasets, model artefacts in SageMaker and Bedrock, embedding stores in OpenSearch, pgvector, Pinecone, Weaviate, and Bedrock Knowledge Bases, fine-tuning datasets, vector indexes. It classifies the contents of those stores using the same regex-plus-NER-plus-trained-classifier stack standard DSPM uses on cloud object stores, and propagates a sensitivity tag onto each artefact. It applies posture rules at the storage tier — which training buckets are public, which vector stores have RBAC enforced, which embedding stores are encrypted with customer-managed keys, which AI services have logging turned on. The work is real. A CISO whose team has not done any of it has a genuine gap to close.

The explainer also names a handful of AI-specific risks — shadow AI, sensitive data flowing into unsanctioned models, training-data poisoning at the dataset tier. The controls Wiz ships against them are reasonable. None of this is the argument of the present piece, which is not "Wiz is wrong" but "Wiz is partial." The map is accurate. The map is also smaller than the territory.

Where the Vendor Lens Stops

Every vendor explainer stops at the same boundary, and Wiz is no exception. The boundary runs between "data at rest plus data in transit," which the DSPM category was built for, and "data being retrieved and generated against," which is application-layer state the storage-tier control surface cannot see. The Wiz dashboard reads the perimeter the same way a CCTV system reads the front door — it tells you whether the lock is engaged. It does not tell you whether the person inside should be allowed to read the file they have just opened.

Four exfiltration vectors live at that seam. Each is documented in the public literature. Each has been demonstrated against production systems. None is stopped by storage-tier posture controls.

The first and dominant one is confused-deputy retrieval. The vector store's RBAC is intact at the resource level, but the chunks are retrieved by the model under the application's service identity, and the application then returns the model's answer without revalidating that the cited chunks were within the user's source-system ACL at the moment of the query. The dashboard reads green. The user has received content they were never entitled to read. The earlier cmdev piece covers the engineering pattern in depth. Think of it as the security guard who checks every badge at the door and then lets every visitor wander into every meeting room.

The second is over-share via embedding similarity. An adversary with query access uses similarity scores as an oracle. Cosine distance leaks information about which content is present even when the chunks themselves are filtered out post-retrieval, and nearest-neighbour probing has recovered sensitive attributes at over 90 per cent accuracy in the 2025 literature. The DSPM-for-AI tool sees the store. It does not see the similarity surface bleeding attribute information sideways through the query interface, the way a transparent envelope reveals the shape of what is inside without anyone needing to open it.

The third is output leakage via paraphrase. The model retrieves a chunk the user is entitled to see — or that the application erroneously allowed through — and produces an answer that recombines fragments into a composition more sensitive than any single chunk it cited. The output is new data; its classification is a property of what the model generated, not what the model started with. No vector-store RBAC catches this. The DSPM-for-AI posture rule does not see the response stream.

The fourth is corpus poisoning via privileged-write paths. An attacker with write access to any source the ingest pipeline trusts plants instructions or content the model later retrieves and acts on. The vendor inventory tells you the corpus exists. It does not tell you the ingest pipeline is reading from a source whose write-side ACL is looser than the read-side principal would expect. It is the difference between checking that the well is full and checking that nobody has been pouring strange liquids into it overnight.

None of the four is exotic. They are the patterns the published RAG attack literature documents and the patterns that our incident-response work keeps surfacing.

Five Gates and Four New Cracks

The five-gate architecture from the earlier piece remains the right operational pattern, and it remains worth restating for a procurement audience: classification at ingestion, sensitivity tag preserved on the chunk, vector-store RBAC, retrieval-time filter, and an output gate with watermarking. A DSPM-for-AI vendor covers the first three natively. The fourth and fifth are engineering work the team has to do regardless of which vendor sits underneath. Twelve months of production exposure have surfaced four further cracks the original architecture did not name, and each one is now load-bearing in deployments where the seam is the dominant risk.

The first crack is multi-tenant vector partition policy. Enterprise RAG has crossed into multi-tenant territory in a way that was not yet the median pattern in 2025 — internal tenancies per business unit, external tenancies for customer-facing assistants, joint-venture tenancies under contractual data-sharing agreements. The rule is simple to state and difficult to enforce: no chunk crosses a tenancy boundary without an explicit allow-list decision, enforced at the partition layer, audited per query. Vendors are starting to detect the partitions. None of them yet enforces the policy.

The second is prompt-version sensitivity tagging. A prompt is itself sensitive content. It encodes business logic, regulatory interpretation, agent capability boundaries, and sometimes embedded examples drawn from real customer data. The 2026 incidents we have seen include prompts pulled from version-control history that contained PII examples no one remembered embedding. A prompt registry is a vault that nobody locked. The control is a hash-pinned registry with classification at write time. No DSPM-for-AI vendor reads the prompt registry today.

The third is audit-chain integration with the principal-of-record. The audit trail keyed by request ID — recording which chunks were retrieved under which principal — has to join the agent action trail and the source-system access trail under a single correlation ID. Without the join, three logs each tell a partial story and none tells the whole one. Vendors are starting to read each leg in isolation. Stitching them together is still the team's responsibility.

The fourth is cross-model embedding migration risk. Embedding models are rotating faster than the corpora they were built against, and the sensitivity decisions made under one embedding model's inversion-resistance profile may not hold under another. Every rotation should trigger a re-evaluation of which sensitivity tiers are permitted to be embedded at all. Vendors track which embedding model is in use. None of them gates the rotation, which is closer to noting that the lock has been changed than asking whether the new key is held by the right hands.

Each of these four crack closes a vector that has produced incidents in the last twelve months. None of them is a vendor SKU yet. The team that has them is the team that built them.

A Tour of the Vendor Landscape

Five vendors plus the build-it-yourself alternative claim some flavour of DSPM-for-AI in 2026. Coverage is uneven, no vendor crosses the seam end-to-end, and the procurement deck most teams hand to a CISO is asking the wrong questions about all of them.

Wiz remains the strongest on discovery and classification across the AI data substrate, with credible inventory across AWS, Azure, and GCP AI services. The AI rule pack reads training-dataset configuration, embedding-store posture, model-artefact storage, and Bedrock or Azure OpenAI service posture cleanly. Where Wiz is weak is the retrieval-and-generation seam — the product is structurally a storage-tier inspector, and the seam is application-layer state the rule engine does not read. It is the right inspection of the wrong floor of the building.

Cyera plays a different game and plays it well. The strength is data classification and lineage at the dataset tier, with an AI extension that maps which datasets fed which model training runs. The lineage story is the cleanest in the category and the natural fit for AI Bill of Materials compliance — a regulator's dream artefact. The layer above lineage, the vector-store policy and retrieval-time control, is thinner; the buyer who values lineage gets a lot, and the buyer who needs the seam covered gets only part way there.

Sentra leads on ML and data-pipeline observability — training-data quality, schema drift, pipeline lineage. The strength is real for teams whose primary risk is in the training and fine-tuning pipeline rather than in the retrieval seam. Retrieval-seam coverage is roadmap, not shipping. The buyer who wants both today buys two products or builds the second.

Securiti is the vendor for the buyer whose first need is regulatory mapping rather than engineering control. NDPA, GDPR, EU AI Act, DORA — all served with policy templates that produce defensible audit artefacts. Technical depth on AI-specific stores is thinner than the regulatory packaging suggests. The natural buyer wants the compliance narrative more than the seam-level control, which is a legitimate need and an honest match.

Borneo is the vendor for the team whose corpora come from SharePoint, Confluence, Drive, and Slack — most of the enterprise RAG world's source of unstructured content. Coverage on vector databases in their own right is less developed; Borneo solves the problem upstream of the embedding step rather than at the embedding step itself.

The build-it-yourself alternative on AWS deserves an honest hearing. Bedrock Knowledge Bases plus Lake Formation plus a custom retrieval-time filter is the working pattern for teams centralised on AWS. Knowledge Bases ships chunk-level metadata filtering. Lake Formation handles dataset classification. A custom gateway implements gate four. The team writes gate five. The cost is two to three engineer-quarters for a credible first version plus ongoing maintenance, and the architecture sits where the team can extend it, with no vendor premium on what is structurally CSPM plus a tag. It is the carpenter who builds the bookshelf because nobody yet sells the bookshelf that fits the wall.

None of the five vendors ships an end-to-end product that covers gates four and five with depth. The category is two years away from doing so, because the controls the seam needs are mostly application-layer state the storage-tier product line was not designed to reach.

When the Line Item Is Worth It

The question a procurement lead has to answer is when a DSPM-for-AI vendor adds something the team's existing standard DSPM plus a custom retrieval-time filter plus the principal-of-record chain does not already deliver. The answer changes shape by estate.

Buy when the estate is multi-cloud with a substantial AI footprint, because the discovery value compounds across providers in a way a single-cloud build does not. Buy when the regulator-facing reporting against EU AI Act Article 10 or NDPA cross-border transfer matters more than the depth of control enforcement — the vendor's artefact is a faster path to an audit-ready document than a team's manual production of the same evidence. Buy when the data substrate is the primary risk and the retrieval seam is downstream of the team's responsibility.

Do not buy when the estate is single-cloud, the AI workload is one or two applications, and the team has the engineering capacity to build gates four and five; the inventory adds little, and what remains is structurally CSPM plus a tag at AI-DSPM prices. Do not buy when the vendor's pitch is dominated by inventory and service-posture rule packs with no clear story on retrieval-time policy or output gating; that vendor is selling a smaller product than the brochure suggests. Do not buy when the retrieval seam is the dominant risk and gates four and five need to be enforced as code under the team's own version control, where every change can be reviewed by the same engineers who own the rest of the runtime.

The five-gate architecture gives the buying conversation a structure even the buyer's vendor cannot argue with. Gates one through three are vendor territory: classification at ingestion, sensitivity tag on the chunk, vector-store RBAC. The procurement questions there are precision and recall on the team's actual data shape, whether the source tag survives the ingest pipeline, and whether self-hosted vector stores are covered. Gates four and five — the retrieval-time filter and the output gate with watermarking — are team-build territory; no vendor covers either as enforcement in mid-2026, and the question is whether the vendor's audit trail will read the team's retrieval-gateway logs so the vendor's posture findings join the team's engineering control into one coherent audit artefact.

A procurement deck that lists all five gates as vendor capabilities is a deck to be sceptical of. A deck that names gates one through three as vendor and gates four and five as team build is the deck that matches the reality of mid-2026.

How Three Regulators Will Read the Same Stack

Three regulatory frames matter for the buyer, and none of them accepts DSPM-for-AI evidence as sufficient on its own. NDPA Sections 41 to 43 on cross-border transfer require demonstrable controls on where personal data flows and on which basis. The vendor inventory tells the regulator where datasets, models, and vector stores are deployed. It does not prove the retrieval layer enforces the transfer constraint at the moment of query — that proof lives in the gateway logs.

EU AI Act Article 10 on data governance maps cleanly onto a credible vendor's lineage artefacts. Article 9 on risk management maps onto the retrieval-time enforcement the vendor does not cover. SOC 2 CC6.1 on logical access asks whether chunk-level access at retrieval time reflects the source ACL on the originating document — the vendor produces evidence that the vector store has RBAC enforced; the team produces the retrieval-gateway log showing principal-by-principal access decisions. The pattern in all three regimes is the same. The vendor's evidence is a necessary input, never a sufficient one. The audit that passes is the audit that has both halves stitched together — the lineage from the vendor, the enforcement from the team — into one document a regulator can read end-to-end.

The honest verdict differs by workload class, and it is worth stating plainly. For consumer-grade AI features on non-sensitive data, the AI-DSPM premium is not justified; standard DSPM plus reasonable engineering hygiene is enough. For enterprise internal-knowledge assistants on confidential but non-regulated data, a vendor for gates one through three plus the team's custom retrieval-time filter is the working pattern — vendor produces inventory and classification, team owns the seam. For regulated workloads under NDPA, GDPR, EU AI Act, or DORA, the regulatory mapping value of a vendor like Securiti compounds with audit cycles; the build alternative is still viable, but the team produces the regulatory artefacts manually. For high-sensitivity multi-tenant RAG — joint ventures, customer-facing assistants on customer data, agentic workloads with privileged write paths — the build-it-yourself architecture wins by default; no vendor covers the seam end-to-end, and the inventory at that scale is not worth the AI-DSPM premium over the standard DSPM the team is already paying for.

The pattern across the four classes is consistent. The vendor is useful precisely where the workload is standard and the seam is shallow. The vendor is progressively less useful as the workload gets more sensitive and the seam gets deeper. The premium tracks inversely to the risk.

DSPM-for-AI is a real category. The vendors are building genuine products. The Wiz explainer is a useful overview and a fair representation of what the product line covers today. The cmdev value-add is not arguing the explainer is wrong. It is naming the gates the vendor's product does not cross, because the gates the vendor does not sell are exactly the ones that catch the confused-deputy retrieval, the embedding-similarity over-share, the output paraphrase, and the corpus-poisoning vectors that have driven production incidents through 2025 and 2026.

The procurement frame that produces a defensible posture names gates one through three as a vendor capability, gates four and five as a team build, and the regulatory artefact as the join between the two. The frame that buys DSPM-for-AI as though the seam were covered by the SKU produces a clean dashboard, a comfortable compliance narrative, and an exfiltration vector the dashboard does not see. Of the two postures, one survives the incident — and when the room two floors down starts reading the confused-deputy log, which one will be in the room?

FAQs

Is the Wiz DSPM-for-AI explainer wrong?

No. It is a fair and useful overview of the category from the vendor's perspective. The point of this piece is not to argue it is wrong but to name the gates it does not cover — the retrieval-and-generation seam, where confused-deputy retrieval, embedding-similarity over-share, output paraphrase, and corpus poisoning actually happen.

How is this piece different from the earlier DSPM-meets-RAG one?

The earlier piece sets out the architecture — the five gates, the four exfiltration vectors, the retrieval pattern. This piece is the procurement-facing extension, mapped against the 2026 DSPM-for-AI vendor landscape, with four additions to the five gates that twelve more months of production deployments have surfaced as load-bearing.

Which DSPM-for-AI vendor should we buy?

Depends on workload class. Single-cloud teams with the engineering capacity to build the retrieval seam often do better with the standard DSPM they already run plus a custom retrieval gateway. Multi-cloud estates with substantial AI inventory get value from Wiz or Cyera on discovery alone. Regulated workloads where the compliance narrative dominates lean toward Securiti's regulatory mapping. None of the vendors covers gates four and five end-to-end, so the team builds those regardless.

Why is the retrieval-time filter the gate the vendors do not ship?

Because it is application-layer state — a function of the query, the principal, the source ACL at the moment of query, and the chunk's metadata at the moment of retrieval. The DSPM-for-AI product line was built on the storage-tier control surface, where rule engines read cloud and SaaS APIs. The retrieval-time decision happens inside the team's application code, where the vendor's rule engine has not historically gone. The same gap exists in the CSPM-to-AIWPM extension; this is the data-tier equivalent.

How does this map to the EU AI Act and NDPA?

Article 10 on data governance maps to the lineage artefacts a credible vendor produces cleanly. Article 9 on risk management maps to the retrieval-time enforcement the vendor does not cover. NDPA Sections 41 to 43 on cross-border transfer need both — the inventory says where data could go, the retrieval audit trail proves where it actually went. The auditor accepts the join, not either half on its own.

Companion content

How to engage

If your procurement deck has a DSPM-for-AI line item on it and you want a control-coverage map of which gates the vendor enforces, which gates the vendor inventories, and which gates the team will have to build regardless — including a documented mapping to NDPA Sections 41 to 43, EU AI Act Articles 9 and 10, and SOC 2 CC6.1 — talk to us at creativeminds.dev/contact.