AI Security

DSPM for AI in 2026: What the Wiz Explainer Doesn't Cover About the Retrieval-and-Generation Seam

Samuel A.14 min read
DSPM for AI in 2026: What the Wiz Explainer Doesn't Cover About the Retrieval-and-Generation Seam
Share
~21 min

Wiz Academy published "DSPM for AI: Best practices and implementation guide" on 23 April 2026. It is one of the cleaner overviews of the category written by a vendor this year — solid on what an AI-specific data substrate looks like, accurate on why standard DSPM does not reach it, useful as a procurement reference. It is also, like every vendor explainer, scoped to the gates the vendor sells. The retrieval-and-generation seam — where confused-deputy retrieval, embedding-similarity over-share, output-leakage paraphrase, and corpus poisoning through privileged-write paths actually exfiltrate data — sits outside the explainer's frame.

cmdev covered the seam directly in our earlier DSPM meets RAG piece. This piece is the explicit Wiz-counter: a fair read of what the explainer covers, a precise read of where it stops, and the operational architecture a procurement lead needs to know the vendor will not ship on its own. The aim is not to argue Wiz is wrong; it is to name the gates the vendor's product does not cross, so the buying conversation produces a defensible posture rather than an AI-DSPM line item that closes nothing standard DSPM did not already cover.

Key takeaways

  • Wiz's DSPM-for-AI explainer covers discovery, classification, and posture of the AI data substrate — training datasets, vector stores, embedding stores. Genuine and useful. It stops where the data is at rest or in transit.
  • The retrieval-and-generation seam — where confused-deputy retrieval, embedding-similarity over-share, output paraphrase, and corpus poisoning actually happen — sits outside the explainer and outside most DSPM-for-AI products in 2026.
  • The five-gate architecture from the earlier cmdev piece holds, with four 2026 additions: multi-tenant vector partition policy, prompt-version sensitivity tagging, audit-chain integration with the principal-of-record, and cross-model embedding migration risk.
  • The vendor landscape sorts into discovery-and-classification leaders (Wiz, Cyera, Borneo), pipeline observability (Sentra), regulatory mapping (Securiti), and the build-it-yourself alternative on Bedrock Knowledge Bases plus Lake Formation plus a custom retrieval-time filter. None covers the seam end-to-end.
  • For most enterprises the answer is a vendor for gates one to three plus a custom build for gates four and five. Paying the AI-DSPM premium for what is structurally CSPM plus a tag is the marker of a bad procurement.

What the Wiz explainer covers, honestly

The Wiz piece does three things well. It extends the DSPM discovery surface into the AI substrate — training datasets, model artefacts in SageMaker and Bedrock, embedding stores in OpenSearch, pgvector, Pinecone, Weaviate, and Bedrock Knowledge Bases, fine-tuning datasets, vector indexes. It classifies the contents of those stores with the same regex-plus-NER-plus-trained-classifier stack standard DSPM uses on cloud object stores, and propagates a sensitivity tag onto the artefact. It applies posture rules at the storage tier — which training buckets are public, which vector stores have RBAC enforced, which embedding stores are encrypted with customer-managed keys, which AI services have logging on.

The explainer also names a handful of AI-specific risks — shadow AI, sensitive data flowing into unsanctioned models, training-data poisoning at the dataset tier. The controls Wiz ships against them are reasonable. A CISO whose team has not done this work has a real gap to close. None of it is the argument of this piece.

Where the Wiz explainer stops

The explainer stops at the same boundary every vendor explainer stops at — between "data at rest plus data in transit" and "data being retrieved and generated against". The first is a storage-tier posture problem the DSPM category was built for. The second is an application-layer problem the storage-tier control surface does not see. The retrieval-and-generation seam is where most of the actual exfiltration happens in production AI workloads today, and it is the seam the Wiz product does not cross.

Four vectors live at that seam. Each is documented, each has been demonstrated against production systems, none is stopped by storage-tier posture controls.

Confused-deputy retrieval. The dominant failure mode in production RAG. The vector store's RBAC is intact at the resource level, but chunks are retrieved by the model under the application's service identity, and the application returns the model's answer without revalidating that the cited chunks were within the user's source-system ACL at query time. The Wiz dashboard reads green; the user receives content they were never entitled to read. The earlier cmdev piece covers the engineering pattern in depth.

Over-share via embedding similarity. An adversary with query access uses similarity scores as an oracle. Cosine distance leaks information about which content is present even when the chunks themselves are filtered out post-retrieval. Nearest-neighbour probing has been shown to recover sensitive attributes at over 90 per cent accuracy in the 2025 literature. The DSPM-for-AI tool sees the store; it does not see the similarity surface leaking attribute information.

Output leakage via paraphrase. The model retrieves a chunk the user is entitled to see — or that the application erroneously allowed through — and produces an answer that combines fragments into a composition more sensitive than any single chunk. The output is new data; its classification is a property of what the model generated. No vector-store RBAC catches this. The DSPM-for-AI posture rule does not see the response stream.

Corpus poisoning via privileged-write paths. An attacker with write access to any source the ingest pipeline trusts plants instructions or content the model later retrieves and acts on. The vendor inventory tells you the corpus exists; it does not tell you the ingest pipeline is reading from a source whose write-side ACL is looser than the read-side principal would expect.

None of the four is exotic. They are the patterns the published RAG attack literature documents and the patterns our incident-response work keeps surfacing.

The five gates, extended for 2026

The five-gate architecture from the earlier piece remains the right operational pattern, and it remains worth restating for the procurement audience. Classification at ingestion, sensitivity tag preserved on the chunk, vector-store RBAC, retrieval-time filter, and an output gate with watermarking. A DSPM-for-AI vendor covers the first three natively. The fourth and fifth are engineering work the team has to do regardless of which vendor sits underneath.

Twelve months on, four 2026 extensions are now load-bearing in production deployments.

Multi-tenant vector partition policy. Enterprise RAG has crossed into multi-tenant territory — internal tenancies per business unit, external tenancies for customer-facing assistants, joint-venture tenancies under contractual data-sharing agreements. The rule: no chunk crosses a tenancy boundary without an explicit allow-list decision, enforced at the partition layer, audited per query. Vendors are starting to detect the partitions but not the policy.

Prompt-version sensitivity tagging. A prompt is itself sensitive content — it encodes business logic, regulatory interpretation, agent capability boundaries, and sometimes embedded examples drawn from real customer data. The 2026 incidents we have seen include prompts pulled from version-control history that contained PII examples no one remembered embedding. The control is a hash-pinned prompt registry with classification at write time. No DSPM-for-AI vendor reads the prompt registry today.

Audit-chain integration with the principal-of-record. The audit trail keyed by request ID — recording which chunks were retrieved under which principal — has to join the agent action trail and the source-system access trail under a single correlation ID. Vendors are starting to read each leg in isolation. The join is the team's responsibility.

Cross-model embedding migration risk. Embedding models are rotating faster than the corpora they were built against. Sensitivity decisions made under one embedding model's inversion-resistance profile may not hold under another. Every rotation should trigger a re-evaluation of which sensitivity tiers are permitted to be embedded at all. Vendors track which embedding model is in use; they do not gate the rotation.

Each addition closes a vector that has produced incidents in the last twelve months. None is a vendor SKU yet. The team that has them is the team that built them.

The DSPM-for-AI vendor landscape, honestly

Five vendors plus the build-it-yourself alternative claim some flavour of DSPM-for-AI in 2026. Coverage is uneven, no vendor crosses the seam end-to-end, and the procurement deck asks the wrong questions.

Wiz. Strong on discovery and classification across the AI data substrate, with credible inventory across AWS, Azure, and GCP AI services. The AI rule pack reads training-dataset configuration, embedding-store posture, model-artefact storage, and Bedrock or Azure OpenAI service posture cleanly. Weak on the retrieval-and-generation seam — the product is structurally storage-tier, and the seam is application-layer state the rule engine does not read.

Cyera. Strong on data classification and lineage at the dataset tier, with an AI extension that maps which datasets fed which model training runs. The lineage story is the cleanest in the category and the natural fit for AI Bill of Materials compliance. The layer above lineage — vector-store policy, retrieval-time control — is thinner.

Sentra. Strong on ML and data-pipeline observability — training-data quality, schema drift, pipeline lineage. Retrieval-seam coverage is roadmap, not shipping.

Securiti. Strong on regulatory mapping — NDPA, GDPR, EU AI Act, DORA — with policy templates that produce defensible audit artefacts. Technical depth on AI-specific stores is thinner than the regulatory packaging suggests. The natural buyer wants the compliance narrative more than the engineering control.

Borneo. Strong on discovery in unstructured stores — SharePoint, Confluence, Drive, Slack — where most enterprise RAG corpora are sourced. Less coverage on vector databases in their own right.

Build-it-yourself on AWS. Bedrock Knowledge Bases plus Lake Formation plus a custom retrieval-time filter is the honest alternative for teams centralised on AWS. Knowledge Bases ships chunk-level metadata filtering, Lake Formation handles dataset classification, a custom gateway implements gate four, the team writes gate five. Cost: two to three engineer-quarters for a credible first version plus ongoing maintenance. The architecture sits where the team can extend it, with no vendor premium on what is structurally CSPM plus a tag.

None of the five vendors ships an end-to-end product that covers gates four and five with depth. The category is two years away, and the controls the seam needs are mostly application-layer state the storage-tier product line was not built to reach.

The buyer's question

The procurement-relevant question is when DSPM-for-AI is a genuine net add over standard DSPM plus a custom retrieval-time filter plus the principal-of-record chain the team already runs.

Yes when: the estate is multi-cloud with a substantial AI footprint and the discovery value compounds across providers; the regulator-facing reporting against EU AI Act Article 10 or NDPA cross-border transfer matters more than the depth of control enforcement; or the data substrate is the primary risk and the retrieval seam is downstream of the team's responsibility.

No when: the estate is single-cloud, the AI workload is one or two applications, and the team has the engineering capacity to build gates four and five — the inventory adds little and the cost is structurally CSPM-plus-a-tag pricing; the vendor's pitch is dominated by inventory and service-posture rule packs with no clear story on retrieval-time policy or output gating; or the retrieval seam is the dominant risk and gates four and five need to be enforced as code under the team's version control.

The procurement framing

The five-gate architecture gives the buying conversation a clean structure. Gates one through three — classification at ingestion, sensitivity tag on the chunk, vector-store RBAC — are vendor territory; the procurement questions are precision and recall on the team's actual data shape, whether the source tag survives the ingest pipeline, and whether self-hosted vector stores are covered. Gates four and five — retrieval-time filter, output gate plus watermark — are team build territory; no vendor covers either as enforcement in mid-2026, and the procurement frame is whether the vendor's audit trail reads the team's retrieval-gateway logs so the posture findings join the engineering control.

The procurement deck that lists gates one through five as vendor capabilities is a deck to be sceptical of. The deck that names gates one through three as vendor and four through five as team build is the deck that matches reality.

The regulatory framing in 2026

Three regulatory frames matter, and none accepts DSPM-for-AI evidence as sufficient on its own. NDPA Sections 41 to 43 on cross-border transfer require demonstrable controls on where personal data flows and on which basis — the vendor inventory tells the regulator where datasets, models, and vector stores are deployed, but does not prove the retrieval layer enforces the transfer constraint at query time. EU AI Act Article 10 on data governance maps cleanly onto vendor lineage artefacts; Article 9 on risk management maps onto the retrieval-time enforcement the vendor does not cover. SOC 2 CC6.1 on logical access asks whether chunk-level access at retrieval time reflects the source ACL on the originating document — the vendor produces evidence that the vector store has RBAC enforced; the team produces the retrieval-gateway log showing principal-by-principal access decisions.

The pattern is the same in all three. The vendor's evidence is a necessary input, not a sufficient one. The audit that passes is the audit that has both halves.

The honest verdict per workload class

For consumer-grade AI features on non-sensitive data, the DSPM-for-AI premium is not justified — standard DSPM plus reasonable engineering hygiene is enough. For enterprise internal-knowledge assistants on confidential but non-regulated data, a vendor for gates one to three plus the team's custom retrieval-time filter is the working pattern — vendor produces inventory and classification, team owns the seam. For regulated workloads under NDPA, GDPR, EU AI Act, or DORA, the regulatory mapping value of a vendor like Securiti compounds with audit cycles; the build alternative is still viable but the team has to produce the regulatory artefacts manually. For high-sensitivity multi-tenant RAG — joint ventures, customer-facing assistants on customer data, agentic workloads with privileged write paths — the build-it-yourself architecture wins by default; no vendor covers the seam end-to-end, and the inventory at that scale is not worth the AI-DSPM premium over standard DSPM.

The pattern across the four classes is that the vendor is useful precisely where the workload is standard and the seam is shallow, and progressively less useful as the workload gets more sensitive and the seam gets deeper.

The closing read

DSPM-for-AI is a real category. The vendors are building genuine products. The Wiz explainer is a useful overview and a fair representation of what the product line covers. cmdev's value-add is not arguing the explainer is wrong — it is naming the gates the vendor's product does not cross, because the gates the vendor does not sell are exactly the ones that catch the confused-deputy retrieval, the embedding-similarity over-share, the output paraphrase, and the corpus-poisoning vectors that have driven production incidents through 2025 and 2026.

The procurement frame that produces a defensible posture names gates one through three as a vendor capability, gates four and five as a team build, and the regulatory artefact as the join between the two. The frame that buys DSPM-for-AI as if the seam were covered by the SKU produces a clean dashboard, a comfortable compliance narrative, and an exfiltration vector the dashboard does not see. Of the two postures, one survives the incident and the other does not.

FAQs

Is the Wiz DSPM-for-AI explainer wrong?

No. It is a fair and useful overview of the category from the vendor's perspective. The point of this piece is not to argue it is wrong but to name the gates it does not cover — the retrieval-and-generation seam, where confused-deputy retrieval, embedding-similarity over-share, output paraphrase, and corpus poisoning actually happen.

How is this piece different from the earlier DSPM-meets-RAG one?

The earlier piece sets out the architecture — the five gates, the four exfiltration vectors, the retrieval pattern. This piece is the procurement-facing extension, mapped against the 2026 DSPM-for-AI vendor landscape, with four additions to the five gates that twelve more months of production deployments have surfaced as load-bearing.

Which DSPM-for-AI vendor should we buy?

Depends on workload class. Single-cloud teams with the engineering capacity to build the retrieval seam often do better with the standard DSPM they already run plus a custom retrieval gateway. Multi-cloud estates with substantial AI inventory get value from Wiz or Cyera on discovery alone. Regulated workloads where the compliance narrative dominates lean toward Securiti's regulatory mapping. None of the vendors covers gates four and five end-to-end, so the team builds those regardless.

Why is the retrieval-time filter the gate the vendors do not ship?

Because it is application-layer state — a function of the query, the principal, the source ACL at the moment of query, and the chunk's metadata at the moment of retrieval. The DSPM-for-AI product line was built on the storage-tier control surface, where rule engines read cloud and SaaS APIs. The retrieval-time decision happens inside the team's application code, where the vendor's rule engine has not historically gone. The same gap exists in the CSPM-to-AIWPM extension; this is the data-tier equivalent.

How does this map to the EU AI Act and NDPA?

Article 10 on data governance maps to the lineage artefacts a credible vendor produces cleanly. Article 9 on risk management maps to the retrieval-time enforcement the vendor does not cover. NDPA Sections 41 to 43 on cross-border transfer need both — the inventory says where data could go, the retrieval audit trail proves where it actually went. The auditor accepts the join, not either half on its own.

Companion content

How to engage

If your procurement deck has a DSPM-for-AI line item on it and you want a control-coverage map of which gates the vendor enforces, which gates the vendor inventories, and which gates the team will have to build regardless — including a documented mapping to NDPA Sections 41 to 43, EU AI Act Articles 9 and 10, and SOC 2 CC6.1 — talk to us at creativeminds.dev/contact.

dspmdspm-for-airagai-securitydata-securityvector-databasescyerawizvendor-comparisonperspective

Ready to strengthen your security posture?

We help organizations across Africa build resilient infrastructure, deploy AI at scale, and navigate complex regulatory environments.

Start a conversation