A platform-engineering lead at a Series C SaaS scores their pipeline against the OWASP DevSecOps Maturity Model. They land on Level 3. Six weeks later the same scorecard sits in front of an external auditor preparing a NIS2 readiness review. The auditor picks one item — "static analysis runs on every pull request and blocks merges on critical findings" — and asks for the last production deploy blocked by a critical SAST finding. The scanner ran. The merge went through because the critical-finding threshold was set to advisory in 2025 and never moved to blocking. The item drops from Level 3 to Level 1, and the same pattern repeats across the next eight items the auditor samples.
This is the median. DSOMM is a useful framework. The audit reality is that most self-assessments overstate the score by one or two levels, and the gap shows up the moment an external assessor stops reading the framework text and starts asking for evidence. This piece is the engineering-audit read.
Key takeaways
- DSOMM is OWASP's framework for DevSecOps maturity across four dimensions — Build & Deployment, Culture & Organisation, Implementation, Information Gathering — and five levels. The failure mode is self-assessment without adversarial review.
- Most organisations self-assess at Level 3 or 4 and fail at Level 2 when an external auditor samples evidence. The pattern is universal: scanning runs but does not block, champions have no protected time, runtime detection sits in audit-only mode, telemetry is collected but not queryable in audit timeframe.
- Level 1 is basic hygiene, Level 2 is automated checks with override paths, Level 3 is enforced gates with an evidence pipeline (the audit-real bar), Level 4 is continuous improvement (rare).
- The audit-reality test is the same five questions every time: one blocked SAST deploy, one timestamped runtime remediation, an asset inventory as of 30 days ago, role permissions for an engineer overdue on training, and signed provenance for the latest production artefact.
- The CRA, NIS2, and EU AI Act regimes are increasingly accepting DSOMM-shaped evidence. The differentiator is whether your auditor confirms or refutes the score.
What DSOMM is, and why it matters in 2026
The OWASP DevSecOps Maturity Model is a structured framework for assessing the security maturity of a software-delivery pipeline. It organises practices across four dimensions — Build & Deployment, Culture & Organisation, Implementation, Information Gathering — and scores each item on a five-level scale from 0 (not performed) through 4 (continuous improvement). The v3 model was published in 2024.
The design intent is sensible. DevSecOps maturity is a vector across dimensions that mature at different rates, not a single number. The framework is tool-neutral. The failure mode is not in the framework. DSOMM was published as a self-assessment instrument, and self-assessment without an adversarial review step is predictably optimistic.
Three regulatory pressures have moved DSOMM from an internal benchmark to an external assessment artefact. The EU Cyber Resilience Act passes its main supply-chain obligations into enforcement from late 2027, and conformity-assessment bodies are accepting DSOMM-shaped evidence in the secure-development-lifecycle review. NIS2 transposition completed across most member states in 2025, and the Article 21 risk-management measures map onto DSOMM's four dimensions. EU AI Act enforcement bites from August 2026 for general-purpose AI and August 2027 for high-risk systems, and the Article 11 technical documentation requirements reference secure-development practices that DSOMM-shaped controls address directly. A self-assessed Level 3 that does not survive an external review is a future remediation project.
The self-assessment theatre pattern
Three structural reasons explain why self-assessed DSOMM scores systematically overstate the audit-real score.
Self-assessment lacks adversarial review. The person filling in the scorecard is the person responsible for the items. Without an external assessor whose job is to disconfirm the score, the optimistic interpretation wins by default.
DSOMM items can be read generously. "Static analysis is integrated into the build pipeline" is true if a scanner runs. It is also true if the scanner runs and blocks merges on critical findings. The framework intends the stricter reading; the self-assessment takes the looser one.
"We have a tool" gets counted as "we enforce a control." A SAST scanner in CI is a tool; a SAST policy that blocks merges on critical findings is a control. The self-assessment counts the tool. The auditor asks for the last time the control fired.
The four dimensions, walked honestly
Each dimension has a characteristic gap pattern that external auditors see repeatedly.
Build & Deployment. SAST, DAST, SCA, IaC scanning, container scanning, signed artefacts, the policy gates that decide what reaches production. The audit gap is almost always the same: the scanners run and the gates do not block. SAST finds critical issues and the policy is advisory. SCA lists high-severity vulnerabilities and the build proceeds. IaC scanning flags unencrypted S3 buckets and the apply step is not gated.
Culture & Organisation. Champion programmes, training, threat modelling. The gap is structural: champions exist on the org chart with no protected time. "10% of Friday is reserved for security work" is the first thing dropped when delivery pressure rises. Training is logged for compliance, not measured for retention. Threat modelling runs on the project that needed a procurement-driven assessment, not on the projects shipping new attack surface every sprint.
Implementation. Hardened container base images, runtime detection, secrets management. The gap is enforcement mode: runtime detection runs in audit-only. Falco, Tetragon, or the runtime tier of an EDR produces alerts, and no alert in the last three months produced a timestamped remediation. Long-lived secrets still sit in environment variables on legacy services. The production fleet still pulls from :latest for services nobody owns.
Information Gathering. Logging, monitoring, asset inventory. The gap is queryability: telemetry exists and is not queryable in audit timeframe. CloudTrail ships to S3 with no Athena table partitioned for the last 90 days. Container log retention is 14 days. The asset inventory sits in a CMDB last reconciled in Q3 and the auditor wants the inventory as of 30 days ago.
Teams at self-assessed Level 3 have the components of Level 3. The components are not wired into enforcement, into evidence, or into queryable telemetry. The auditor's job is to detect the wiring gap.
The five levels, honestly
Level 0 — not performed. Rare by 2026; some level of scanning, monitoring, and access control is table stakes.
Level 1 — basic hygiene. SAST runs somewhere. Container scanning produces a report. Logs are collected. No enforcement, no remediation SLA, no measurement. Most organisations sit honestly at Level 1 across most items.
Level 2 — automated checks with override paths. Scanners run on a schedule, detection rules produce alerts, reports are generated. Overrides are possible and undisciplined — anyone can mark a finding as not applicable, the override path has no sunset, override volume is not measured. Most self-assessments claim higher; most enforcement sits here.
Level 3 — enforced gates with an evidence pipeline. Gates block. Bypass is possible only through an audited break-glass procedure with a documented owner, a sunset date, and a queryable record. Evidence is centralised and queryable in audit timeframe. Champions have protected time. Threat modelling is part of design review for production features. This is the audit-real bar for regulated markets.
Level 4 — continuous improvement. Metrics are tracked, the team adjusts controls based on what the data shows. Threat modelling runs at design for every new production capability. Override paths have hard expiry and are measured as a leading indicator. Rare.
Honest distribution in 2026: most organisations sit at Level 1 to Level 2 across most items, claim Level 3, and have one or two items legitimately at Level 3 hidden inside the optimistic claim.
The audit-reality test
External assessors do not score DSOMM by reading the self-assessment text. They sample evidence against five questions.
"Show me one production deploy from the last week blocked by your SAST policy." Build & Deployment. A team enforcing SAST gates has the record in CI history. A team in advisory mode produces a screenshot of the scanner running.
"Show me one runtime alert from the last month that produced a timestamped remediation." Implementation. A team with runtime detection wired into on-call has an incident ticket. A team in audit-only mode has alerts in a log nobody reads.
"Show me your asset inventory queryable as of 30 days ago, with production services in scope, ownership, and last security review date." Information Gathering. A team with a queryable inventory runs an Athena query. A team without the query path produces a spreadsheet last edited four months ago.
"Show me the role permissions for an engineer who has not completed security training in the last 12 months." Culture & Organisation. A team running an enforcement loop has either revoked elevated permissions or has a time-bound exception with a documented owner. A team treating training as a tickbox shows the engineer still holds production access.
"Show me the signed provenance for the latest production artefact, and the deploy-time policy that verifies it." Links Build & Deployment to Implementation. The SLSA reading is in the companion piece; the DSOMM reading is that the provenance is part of the evidence the auditor expects.
A team at the level it claims answers all five within a working day. A team one level below produces three answers and asks for time on two.
The evidence-pipeline gap
Across the five questions, the recurring failure is not the control. It is the evidence pipeline.
A team running SAST that blocks merges has the control; if CI history sits on a hosted runner with 30-day retention and the auditor looks back 90 days, the evidence is gone. A team running runtime detection has the control; if the alert log lives in a SIEM not partitioned for query, the evidence is unqueryable. A team running a champion programme has the control; if the calendar evidence is not exportable, the auditor falls back to the policy document and downgrades the score.
The DSOMM score reflects the evidence the auditor can sample, not the controls the team has installed. Investing in the evidence pipeline — centralised logging with audit-period retention, exportable calendars, queryable inventory, exportable CI history — is the highest-leverage move.
The cmdev pattern for honestly reaching Level 3
The pattern below is what we wire for clients preparing for an NIS2 readiness review, a CRA conformity assessment, or a SOC 2 examination with a DevSecOps surface.
Enforce, don't advise. Every gate the framework describes as a control is wired in enforcing mode from day one. SAST blocks merges on critical findings. SCA blocks builds on high-severity vulnerabilities. IaC scanning blocks the apply step. Admission controllers block deployments without signature verification. The cultural work is accepting that this will, in the first weeks, block deploys until the configuration is correct. Advisory mode "going to be moved to enforcing soon" never moves without an external forcing function.
Centralise evidence. A single evidence lake, partitioned for query in audit timeframe. On AWS the pattern is Lake Formation or S3 plus Athena; on GCP, BigQuery on a logging sink; on Azure, Log Analytics with extended retention. CI history, runtime alerts, admission decisions, IAM audit logs, and the change-control record land in queryable form with at least 12 months of retention.
Override discipline. Every override is a ticket with a documented owner and a sunset date. Override volume is a measured metric. Overrides without sunsets graduate into permanent exceptions, which graduate into the gap the auditor finds.
Champion programme with protected time. Protected time on calendars, exportable, reviewed. Output is measurable — threat-model contributions, security-debt remediations, training delivered to other teams. Champions without protected time are titles.
Threat modelling at design. For every new production capability that introduces new attack surface, a structured threat-modelling step in the design review, output stored alongside the architecture documentation. A STRIDE walk with documented mitigations and accepted residual risk is sufficient for most non-defence work. The pattern that fails audit is threat modelling treated as a procurement-driven exercise, not an engineering practice.
The honest assessment pattern
When cmdev assesses a client's DSOMM posture, the engagement is a structured walk-through, not a documentation review. Score each dimension against the audit-reality questions, not the self-assessment text. Identify the evidence-pipeline gaps — the failure mode is usually that the control exists in advisory mode, or that the control exists in enforcing mode and the evidence is not queryable in audit timeframe. Produce a 90-day plan: which gates move from advisory to enforcing, which evidence pipelines are extended, which champion roles get protected time, which threat-modelling step is added to which design review. Sized for delivery, not for ambition.
The durable artefact is the evidence pipeline. Framework scores will change as DSOMM iterates; the evidence pipeline keeps producing the queries the next assessor will ask.
What the framework is, and what it is not
DSOMM is a useful framework. The score is not the artefact that matters. The differentiator is whether an external assessor confirms or refutes the score when they sample evidence. The five questions are the test. The evidence pipeline is the answer. The 90-day plan to close the gap is the work.
Self-assessment without adversarial review is theatre. The CRA, NIS2, and EU AI Act regimes are increasingly accepting DSOMM-shaped evidence as part of their conformity assessments, and the assessors are reading evidence, not scores. The organisations that come out of their first DSOMM-scope external review in good shape are the ones that scored themselves honestly, identified the evidence-pipeline gaps, and shipped the wiring that holds.
FAQs
Is DSOMM formally referenced in NIS2, the CRA, or the EU AI Act?
Not by name. Conformity assessment bodies and national supervisors implementing those regimes are accepting DSOMM-shaped evidence as part of the secure-development-lifecycle and risk-management reviews. The framework is becoming an external assessment artefact in practice, not by regulation.
What is the realistic DSOMM level for most enterprise pipelines in 2026?
Level 1 to Level 2 across most items, with one or two items legitimately at Level 3 hidden inside an optimistic Level 3 self-assessment. Genuine Level 3 across all four dimensions is rare and requires a platform-engineering team resourced to wire enforcement, evidence pipelines, and override discipline as production-grade controls.
Why do self-assessed scores systematically overstate audit-real scores?
Self-assessment lacks adversarial review, so the optimistic interpretation wins by default. DSOMM items can be read loosely ("we have a tool") or strictly ("we enforce a control"); self-assessments take the loose reading. The score is structurally untestable until an external assessor samples evidence.
What is the single highest-leverage move to close the gap?
The evidence pipeline. Centralising CI history, runtime alerts, admission decisions, and IAM audit logs into a queryable store with at least 12 months of retention turns the next external assessment from a reconstruction exercise into a query exercise. Every future audit asks the same evidence-pipeline questions.
How does cmdev run a DSOMM assessment?
A structured walk-through. Each dimension is scored against the five audit-reality questions, the evidence-pipeline gaps are identified, a 90-day plan moves the gates from advisory to enforcing and extends the evidence retention, and the durable artefact is the wiring that holds the score on the next external review.
Companion content
- SLSA Levels in Practice: What Most Organisations Claim vs What Their Pipeline Actually Enforces — the same critique pattern applied to supply-chain maturity
- SOC 2 for AI Deployments: Trust Service Criteria — the audit-reality lens applied to SOC 2 for AI-bearing systems
- The AI Act Enforcement Deadline — the conformity assessment regime that will sample DSOMM-shaped evidence
- OPA for AI Agent Action Approval — the policy-engine pattern that produces enforceable Level 3 gates
- Compliance Automator: A Case Study — the open-source evidence-pipeline pattern
How to engage
If your team is preparing for an NIS2 readiness review, a CRA conformity assessment, or a SOC 2 examination with a DevSecOps surface, and the self-assessment you published last year is starting to feel optimistic, we can run the assessment against the audit-reality questions, identify the evidence-pipeline gaps, and ship the 90-day plan that closes the score honestly. Talk to us at creativeminds.dev/contact.
