What Zero Doesn’t Tell You

The Comfort of a Number

Zero is comforting.

It suggests completion. Certainty. A clean state. In systems that are otherwise complex and difficult to reason about, a number like zero offers something rare — closure.

And that is precisely why it can be misleading.

A Mismatch

Metrics usually begin as signals. They help us observe a system, compare states, and make decisions with some degree of objectivity. Over time, however, they tend to become targets. Once that happens, the relationship between the metric and the underlying reality starts to weaken.

The number remains. The meaning begins to drift.

I ran into this recently in a system where “zero vulnerabilities” had become an important goal.

Two base images were being evaluated.

One passed cleanly — no reported findings. The other did not.

At first glance, the decision seemed straightforward. If the objective is zero, and one option achieves it, the system appears to have made the choice for you.

But something about that conclusion did not sit right.

Not because the number was incorrect, but because it conflicted with an expectation I had about the systems themselves.

One of the images was distroless — intentionally minimal, designed to include only what is required to run the application and little else. The other included a broader runtime environment, with more built-in capabilities available inside the container.

Instinctively, the distroless image felt more constrained. There was simply less present that could be used if something went wrong.

And yet, the scan results suggested the opposite.

The distroless image reported more findings. The other appeared clean.

At that point, the question was no longer which image to choose.

It was: what exactly were we measuring?

What We Were Measuring

Looking closer did not reveal an issue in how the system behaved. It revealed a limitation in how the system was being observed.

What we had was not a direct measure of risk, but a measure of what the scanning process could identify, based on its data sources and reporting model.

The number was accurate within that context.

But the context itself was narrower than the system it was being used to represent.

The situation became clearer when we looked at the same images through different scanners.

The results were not identical.

The counts shifted. Some findings appeared, others disappeared.

Nothing about the underlying system had changed.

Only the lens had.

Which meant the number we were optimizing for was not just a function of the system — but also of how it was being observed.

When Metrics Become Targets

There is a broader pattern here.

It is often described through Goodhart’s Law:

When a metric becomes the goal, the system begins to optimize the metric rather than the outcome it was intended to capture.

This is not unique to security. It is a general property of optimization systems.

In reinforcement learning, an agent maximizes a reward signal. If the reward function is well-designed, this leads to the desired behavior. If it is not, the agent finds ways to maximize the signal while drifting away from the original intent.

A well-known example is a simulated boat racing environment from OpenAI, where the agent learned to loop in place collecting reward points instead of completing the race. The system did not fail. It optimized exactly what it was given.

The behavior was correct with respect to the metric — and incorrect with respect to the outcome.

The system does not fail.

It does exactly what it was asked to do.

Just not what was meant.

The same dynamic appears in organizational systems.

If vulnerability count becomes the primary measure of security, teams will optimize for reducing that number.

Over time, the system adapts around the metric.

The number improves.

The underlying risk may not.

There is a more subtle implication here.

If the metric had been applied strictly — zero findings as a hard requirement — the distroless image would have been rejected outright.

Not because it exposed more capability.
Not because it was more exploitable.

But because it surfaced more findings in a particular scanning context.

The system would have optimized for the metric.

And in doing so, potentially selected an option that was less constrained at runtime.

The outcome would still satisfy the policy.

But not necessarily the intent.

A Different Way to Look at the Same System - 3 Pillar Framework

CVE count, in this sense, behaves like a proxy.

Useful, but incomplete.

When treated as the objective, it begins to exhibit the same characteristics as a poorly designed reward function — easy to optimize, but not always aligned with what we actually care about.

The shift in thinking, when it happened, was not dramatic.

We stopped asking which option had fewer reported vulnerabilities, and started asking what risk we were actually trying to reduce.

That led to a different way of evaluating the same system — not as a number, but along three dimensions.

The first pillar was about fixability.

Were there vulnerabilities with available fixes that had not yet been applied?

The second pillar was about exploitability.

Among the remaining issues, how likely are they to be used in practice?

The third pillar was about exposure at runtime.

If something were to get through, what capabilities would be available inside the system?

These pillars aka questions did not contradict the metric.

They extended it.

And they changed the answer.

In this case,

Pillar 1: Both images had reached a point where there were no immediate fixes to apply.

Pillar 2: Both showed low likelihood of exploitation based on available data.

Pillar 3: One image exposed more capability at runtime (and that was distroless).

The distinction lay in what the system allowed once running.

The other constrained it.

The number had pointed in one direction.

The system, when examined through a different lens, pointed in another.

What Changes When You Look Differently

This way of thinking is not new.

It is reflected, in different forms, in existing security guidance.

NIST SP 800-190 treats vulnerability management as a continuous, risk-based process rather than a binary state.

The CIS Docker Benchmark emphasizes reducing attack surface — removing unnecessary components, limiting capabilities, and constraining what is available at runtime.

The principles are well understood.

What varies is how they are applied in practice.

This alignment is not accidental.

Most mature security frameworks do not define security as the absence of findings, but as the presence of effective controls and risk management.

Standards like SOC 2 and ISO 27001 focus on how vulnerabilities are identified, assessed, and managed over time — not on achieving a static count.

Even stricter environments such as FedRAMP operate on continuous monitoring and risk acceptance, rather than assuming that “zero findings” is a meaningful or achievable steady state.

The emphasis, consistently, is on managing risk — not eliminating numbers.

Metrics, and What They Hide

Metrics are abstractions.

They compress a complex system into something that can be observed, compared, and optimized.

That compression is valuable.

But it also hides detail.

The goal, then, is not to eliminate metrics.

They remain essential.

But they cannot be the decision.

They are a starting point.

The rest requires judgment.

Because in systems of any real complexity, the cleanest number is not always the clearest signal.

And sometimes, zero is not the end of the story.

It is simply a reflection of what we chose to see — and what we didn’t..

What Zero Doesn’t Tell You

The Comfort of a Number

A Mismatch

What We Were Measuring

When Metrics Become Targets

A Different Way to Look at the Same System - 3 Pillar Framework

What Changes When You Look Differently

Metrics, and What They Hide

Comments

Engineering In the Trenches

When Consumers Fail: Extending Selective Retry to Kafka

More from this blog

Pulling the Sword Is No Longer the Test

How I Lost a Credit Card Trying to Get One

API Maturity in the Age of AI

When Consumers Fail: Extending Selective Retry to Kafka

Command Palette

The Comfort of a Number

A Mismatch

What We Were Measuring

When Metrics Become Targets

A Different Way to Look at the Same System - 3 Pillar Framework

What Changes When You Look Differently

Metrics, and What They Hide

Comments

Engineering In the Trenches

When Consumers Fail: Extending Selective Retry to Kafka

More from this blog