OWASP DevSecOps Maturity Model (DSOMM): The Audit Reality Beneath the Self-Assessment Theatre

The lead engineer puts a folder on the table. Inside is the scorecard her team filled out three months ago — OWASP DSOMM, Level 3 across most items, neatly evidenced with screenshots of the pipeline and links to the policy docs. The auditor opens it on page two. "Show me," she says, "the last production deploy your SAST policy blocked because of a critical finding." There is a pause. The scanner did run. The merge went through. Someone had set the threshold to advisory in 2025, and nobody had moved it to blocking. By lunchtime the Level 3 score had become a Level 1 score on eight different items, all bleeding from the same wound.

This is roughly the median. DSOMM is a useful framework, the same way a stethoscope is a useful instrument — it tells you something if the person holding it knows what to listen for and is honest about what they hear. Self-assessment without an adversarial counterpart is more like taking your own temperature with a broken thermometer: the number is comforting, and the number is wrong.

Key takeaways

DSOMM is OWASP's framework for DevSecOps maturity across four dimensions — Build & Deployment, Culture & Organisation, Implementation, Information Gathering — and five levels. The failure mode is self-assessment without adversarial review.
Most organisations self-assess at Level 3 or 4 and fail at Level 2 when an external auditor samples evidence. The pattern is universal: scanning runs but does not block, champions have no protected time, runtime detection sits in audit-only mode, telemetry is collected but not queryable in audit timeframe.
Level 1 is basic hygiene, Level 2 is automated checks with override paths, Level 3 is enforced gates with an evidence pipeline (the audit-real bar), Level 4 is continuous improvement (rare).
The audit-reality test is the same five questions every time: one blocked SAST deploy, one timestamped runtime remediation, an asset inventory as of 30 days ago, role permissions for an engineer overdue on training, and signed provenance for the latest production artefact.
The CRA, NIS2, and EU AI Act regimes are increasingly accepting DSOMM-shaped evidence. The differentiator is whether your auditor confirms or refutes the score.

A Stethoscope, Not a Scoreboard

OWASP's DevSecOps Maturity Model was published in 2024 as v3, organised across four dimensions — Build & Deployment, Culture & Organisation, Implementation, Information Gathering — and scored on a five-level scale from zero through four. Read sensibly, it is a vector. Maturity in pipeline scanning is not the same animal as maturity in champion programmes, and they grow on different schedules. The framework's design intent is that you assess each dimension separately, knowing they will mature at different rates.

The flaw is not in the model. The flaw is that DSOMM was shipped as a self-assessment tool, and asking a team to grade its own pipeline is a bit like asking a chef to mark their own restaurant inspection. The optimistic reading wins because nobody in the room is being paid to disconfirm it.

What has changed in 2026 is that the score no longer stays inside the team. EU Cyber Resilience Act conformity bodies are now reading DSOMM-shaped evidence as part of the secure-development-lifecycle review, ahead of the late-2027 enforcement bite. NIS2 transposition finished across most member states in 2025; Article 21's risk-management measures map onto DSOMM's four dimensions with very little translation. EU AI Act enforcement starts on general-purpose AI in August 2026, and the Article 11 technical-documentation requirements lean on the same secure-development controls DSOMM describes. The scorecard that was once a private benchmark is now an artefact a stranger will read.

Why the Scorecard Lies

There are three reasons self-assessed scores drift upward. The first is the absence of an adversary. The same engineer who wrote the policy is the one filling out the form, and humans are generous with their own work.

The second is that DSOMM's language can be read two ways. "Static analysis is integrated into the build pipeline" is true if a scanner runs. It is also true if the scanner runs and blocks the merge on critical findings. The framework intends the stricter reading. The self-assessment takes the looser one, because the looser one is true and the looser one is what the team has.

The third is a category confusion the industry has not properly named. A SAST scanner in CI is a tool. A SAST policy that blocks merges on critical findings is a control. A car with airbags installed but no inflator is not a car with airbags. The self-assessment counts the tool. The auditor asks when the control last fired.

Four Dimensions, Four Recurring Wounds

Walk the dimensions and you start seeing the same gaps the way a GP sees the same five complaints in a morning surgery.

In Build & Deployment — SAST, DAST, SCA, IaC scanning, container scanning, signed artefacts — the wound is almost always advisory mode. The scanners run. The policy does not block. Critical SAST findings become tickets. High-severity SCA vulnerabilities become tickets. IaC scanning flags an unencrypted S3 bucket and the apply step proceeds. The team has bought the gate and forgotten to close it.

In Culture & Organisation the wound is calendar. Champions exist on the org chart, with titles. They do not have protected time. The "10% Friday" rule is the first thing eaten when a release slips, and the release always slips. Training gets logged for compliance, not measured for retention. Threat modelling happens on the one project a procurement department demanded it for, while the three projects shipping new attack surface this sprint go untouched.

In Implementation the wound is mode. Runtime detection runs in audit-only — Falco, Tetragon, the runtime tier of an EDR, all producing alerts that nobody owns. No alert in the last three months produced a timestamped remediation. Long-lived secrets sit in environment variables on legacy services that everyone assumed would be retired by now. The production fleet still pulls from :latest for the services with no listed owner.

In Information Gathering the wound is queryability. The telemetry exists; the auditor cannot reach it in time. CloudTrail ships to S3 with no Athena table partitioned for the last 90 days. Container logs retain for 14 days while the audit window is 90. The asset inventory sits in a CMDB last reconciled in Q3, and the auditor wants the picture as of 30 days ago.

A team at self-assessed Level 3 usually has the components of Level 3. What it has not done is wire those components into enforcement, into queryable evidence, into the loops the auditor can follow. The auditor's job, fundamentally, is to find the missing wires.

Five Floors of a Building Most Teams Have Not Climbed

Picture maturity as a five-storey building. The ground floor is Level 0, not performed at all — rare in 2026, because some scanning, monitoring, and access control is now table stakes.

Level 1 is basic hygiene. SAST runs somewhere. Container scanning produces a report. Logs are collected. There is no enforcement, no remediation SLA, nothing measured against itself over time. Most organisations sit honestly at Level 1 across most items.

Level 2 is automated checks with override paths. Scanners run on schedule, detection produces alerts, reports get generated. The overrides are possible and undisciplined — anyone can mark a finding as not applicable, there is no sunset on the exception, the override volume is never measured. Most self-assessments claim higher than this. Most actual enforcement lives here.

Level 3 is enforced gates with an evidence pipeline. Gates block. Bypass is possible only through an audited break-glass procedure with a named owner, a sunset date, and a queryable record. Evidence is centralised and reachable in audit timeframe. Champions hold protected time on their calendars. Threat modelling is part of design review for production features. This is the bar regulators are starting to require.

Level 4, the top floor, is continuous improvement. Metrics get tracked and controls adjust to what the data shows. Threat modelling runs at design for every new production capability. Override paths have hard expiry and are measured as a leading indicator. Rare in the wild.

The honest distribution in 2026 looks something like this: Level 1 to Level 2 across most items, claimed at Level 3, with one or two genuine Level 3 items hiding inside the optimistic claim and keeping the whole story plausible.

The Five Questions That End the Pretence

External assessors do not score DSOMM by reading the self-assessment text. They sample evidence against five questions, and the questions are nearly identical from engagement to engagement.

The first is show me one production deploy from the last week blocked by your SAST policy. This belongs to Build & Deployment, and it is the question that separates an enforcing pipeline from an advisory one. A team enforcing gates has the record in CI history. A team in advisory mode produces a screenshot of the scanner running.

The second is show me one runtime alert from the last month that produced a timestamped remediation. This belongs to Implementation. A team with runtime detection wired into on-call hands over an incident ticket. A team in audit-only mode produces alerts in a log nobody reads.

The third is show me your asset inventory queryable as of 30 days ago, with production services in scope, ownership, and last security review date. This belongs to Information Gathering. A team with a queryable inventory runs an Athena query and prints the result. A team without one produces a spreadsheet last edited four months ago.

The fourth is show me the role permissions for an engineer who has not completed security training in the last 12 months. This belongs to Culture & Organisation. A team running an enforcement loop has either revoked the elevated permissions or has a time-bound exception with a documented owner. A team treating training as a tickbox shows the engineer still holds production access.

The fifth links Build & Deployment to Implementation: show me the signed provenance for the latest production artefact, and the deploy-time policy that verifies it. The SLSA reading lives in the companion piece; the DSOMM reading is that the provenance has joined the evidence the auditor expects to find.

A team at the level it claims answers all five within a working day. A team one level below produces three answers and asks for time on two.

The Real Failure Is Not the Control — It Is the Pipeline

Across all five questions the recurring failure has the same shape, and the shape is not what people assume. It is not that the control is missing. The control is there. The pipeline that carries the evidence of the control to the assessor's screen is missing.

A team running SAST that blocks merges has the control. If CI history sits on a hosted runner with 30-day retention and the auditor looks back 90 days, the evidence has dissolved. A team running runtime detection has the control. If the alert log lives in a SIEM that is not partitioned for query, the evidence is unreachable. A team running a champion programme has the control. If the calendar entries cannot be exported, the auditor falls back to the policy document and downgrades the score for want of proof.

The DSOMM score, in practice, reflects the evidence an auditor can sample. Not the controls a team has installed. This is uncomfortable but useful. The highest-leverage investment a team can make is in the plumbing — centralised logging at audit-period retention, exportable calendars, queryable inventory, exportable CI history — because the plumbing is what survives the visit.

How We Wire a Defensible Level 3

What follows is the pattern we ship for clients preparing for an NIS2 readiness review, a CRA conformity assessment, or a SOC 2 examination with a DevSecOps surface. None of it is invention. All of it is sequence.

The first move is to enforce rather than advise. Every gate the framework describes as a control is wired into enforcing mode from day one. SAST blocks merges on critical findings. SCA blocks builds on high-severity vulnerabilities. IaC scanning blocks the apply step. Admission controllers block deployments without signature verification. The cultural work — and there is cultural work — is the first two weeks, when the new gates block deploys until configurations get corrected. Teams that promise to "move to enforcing soon" rarely move without an external forcing function. The auditor is that forcing function in retrospect; flipping the switch up front saves the embarrassment.

The second move is to centralise the evidence into a single lake, partitioned for query in audit timeframe. The pattern on AWS is Lake Formation or S3 plus Athena. On GCP it is BigQuery on a logging sink. On Azure, Log Analytics with extended retention. CI history, runtime alerts, admission decisions, IAM audit logs, and the change-control record all land in queryable form with at least 12 months of retention. Think of this as the difference between filing your receipts in a shoebox and filing them in a labelled cabinet — the receipts existed either way, but only one version is findable in the meeting that matters.

The third move is override discipline. Every override is a ticket with a documented owner and a sunset date. Override volume becomes a measured metric, tracked weekly. Overrides without sunsets graduate into permanent exceptions, and permanent exceptions become the gap the auditor will eventually find. The point is not to forbid overrides — sometimes the gate is wrong about the risk — but to make every exception visible and time-bound.

The fourth move is the champion programme with protected time on the calendar. Exportable, reviewed, measurable. Output is contributions to threat models, security-debt remediations, training delivered to other teams. A champion without protected time is a title pretending to be a role, and the auditor knows the difference within ten minutes.

The fifth move is threat modelling at design. For every new production capability that introduces new attack surface, the design review includes a structured threat-modelling step, with output stored alongside the architecture documentation. A STRIDE walk with documented mitigations and accepted residual risk is enough for most non-defence work. The pattern that fails audit is threat modelling treated as a procurement-driven exercise, summoned only when a customer demands it, rather than an engineering practice that runs as routinely as code review.

What the Score Actually Means

When we assess a client's DSOMM posture, the engagement is a structured walk-through, not a documentation review. Each dimension gets scored against the five audit-reality questions, not against the self-assessment text. The evidence-pipeline gaps surface within the first day — the control is in advisory mode, or the control is enforcing and the evidence is unqueryable, or the calendar is locked behind a tool that does not export. The output is a 90-day plan, sized for delivery rather than ambition: these gates move from advisory to enforcing, this retention extends, these champion roles get protected time, this design review gains a threat-modelling step.

The durable artefact is never the score. Framework scores will change as DSOMM iterates. The evidence pipeline, once built, keeps producing the queries the next assessor will ask, in whatever framework they happen to be holding. It is the foundation underneath the scoring methodology, and foundations age better than methodologies.

DSOMM is a real framework. The Cyber Resilience Act, NIS2, and the EU AI Act are increasingly accepting DSOMM-shaped evidence as input to conformity assessments — and the assessors are reading evidence, not scores. The organisations that come out of their first external review in good shape are the ones that scored themselves honestly, identified the missing pipes, and shipped the wiring that holds. The ones that arrive with optimistic scorecards arrive with future remediation projects in their hand and do not yet know it.

The question, after you finish reading the scorecard from three months ago, is simpler than the framework makes it sound: if your auditor asked tomorrow for one blocked deploy, one timestamped remediation, and one queryable inventory, how long would the silence last before someone in the room said yes?

FAQs

Is DSOMM formally referenced in NIS2, the CRA, or the EU AI Act?

Not by name. Conformity assessment bodies and national supervisors implementing those regimes are accepting DSOMM-shaped evidence as part of the secure-development-lifecycle and risk-management reviews. The framework is becoming an external assessment artefact in practice, not by regulation.

What is the realistic DSOMM level for most enterprise pipelines in 2026?

Level 1 to Level 2 across most items, with one or two items legitimately at Level 3 hidden inside an optimistic Level 3 self-assessment. Genuine Level 3 across all four dimensions is rare and requires a platform-engineering team resourced to wire enforcement, evidence pipelines, and override discipline as production-grade controls.

Why do self-assessed scores systematically overstate audit-real scores?

Self-assessment lacks adversarial review, so the optimistic interpretation wins by default. DSOMM items can be read loosely ("we have a tool") or strictly ("we enforce a control"); self-assessments take the loose reading. The score is structurally untestable until an external assessor samples evidence.

What is the single highest-leverage move to close the gap?

The evidence pipeline. Centralising CI history, runtime alerts, admission decisions, and IAM audit logs into a queryable store with at least 12 months of retention turns the next external assessment from a reconstruction exercise into a query exercise. Every future audit asks the same evidence-pipeline questions.

How does cmdev run a DSOMM assessment?

A structured walk-through. Each dimension is scored against the five audit-reality questions, the evidence-pipeline gaps are identified, a 90-day plan moves the gates from advisory to enforcing and extends the evidence retention, and the durable artefact is the wiring that holds the score on the next external review.

Companion content

SLSA Levels in Practice: What Most Organisations Claim vs What Their Pipeline Actually Enforces — the same critique pattern applied to supply-chain maturity
SOC 2 for AI Deployments: Trust Service Criteria — the audit-reality lens applied to SOC 2 for AI-bearing systems
The AI Act Enforcement Deadline — the conformity assessment regime that will sample DSOMM-shaped evidence
OPA for AI Agent Action Approval — the policy-engine pattern that produces enforceable Level 3 gates
Compliance Automator: A Case Study — the open-source evidence-pipeline pattern

How to engage

If your team is preparing for an NIS2 readiness review, a CRA conformity assessment, or a SOC 2 examination with a DevSecOps surface, and the self-assessment you published last year is starting to feel optimistic, we can run the assessment against the audit-reality questions, identify the evidence-pipeline gaps, and ship the 90-day plan that closes the score honestly. Talk to us at creativeminds.dev/contact.