Evaluating the Effectiveness of Security Testing: Metrics and KPIs for Businesses to Measure Success
Picture this: a penetration test lands in your inbox, flagged red. Nineteen findings, three critical. Six months later, the follow-up test runs and eight of the nineteen are still open (including two of the criticals). The test was thorough. The report was detailed. The post-test meeting had that particular gravity that serious people bring to serious things. And yet the organization is no safer than it was before the first engagement.
This is the norm, not the exception. Security testing generates enormous amounts of findings and surprisingly little actual risk reduction. The testing itself is usually fine. Most commercial penetration tests are competent, and automated scanners find real issues. The problem is the absence of any metrics to tell you whether your testing programme is actually working. Without them, testing becomes a compliance ritual: tests run because they must, reports get filed because they’re required, and findings accumulate in a backlog that nobody has the mandate to close.
We’re here to discuss the metrics that turn security testing from a compliance exercise into an improvement loop. This is especially relevant to small-business leaders who are paying for security testing and want to know whether the money is buying them what they think it is.
What You Are Actually Measuring
A security testing programme has three measurable outputs:
- The findings it generates
- The remediation it drives
- The reduction in real-world risk over time
Most organizations track the first, occasionally glance at the second, and barely think about the third. That ordering is exactly how you end up with an inbox full of untouched reports.
A good metric set covers all three layers, weighted so the behaviour you’re rewarding is remediation and risk reduction, not simply raw finding count.
The Five Metrics That Matter
Across our engagements, five metrics produce most of the actionable signal. They’re not exotic. Each has analogues in NIST SP 800-55 (performance measurement for information security) and the Centre for Internet Security’s benchmarks. But they’re specific enough to actually drive decisions.
1. Mean Time to Remediate (MTTR)
How long, on average, does it take to fix a finding after it’s been identified? Tracked by severity, this is the single most informative metric in your programme. An MTTR of three days for critical findings and thirty days for high findings suggests a functioning remediation pipeline. An MTTR of ninety days for critical findings suggests that testing and remediation are operating in completely different universes.
Good benchmarks are hard to find, but the Cyentia Institute Vulnerability Research shows that organizations in the top quartile fix critical vulnerabilities within days, while the bottom quartile takes months. A reasonable starting target for a small business: critical findings closed within seven days, high findings within thirty.
2. Finding Density
How many findings per asset, per release, or per line of code does your testing surface? The trend matters more than the absolute number. A ratio that falls release after release suggests that upstream practices (secure coding, code review, automated static analysis) are catching issues before testing does. A ratio that holds steady or rises suggests your pipeline is discovering the same categories of issues over and over again. The upstream work isn’t learning.
Track the ratio in findings separately for:
- Recurring tests (SAST, DAST in CI): per release
- Periodic tests (penetration testing, red team exercises): per engagement
3. Escape Rate
What percentage of vulnerabilities reach production before getting caught? Every vulnerability lives somewhere in your pipeline: requirements, design, implementation, testing, or production. Ideally most get caught before implementation. A healthy programme catches most before formal testing. An unhealthy one catches them in production. Or worse, in the wild.
To calculate escape rate, tag each finding with the phase it was introduced in and the phase it was caught in. An escape rate that trends downward means your upstream controls are getting stronger. An escape rate concentrated in the “post-release” column means something in your development process isn’t keeping up with your codebase.
4. Coverage
Testing without coverage data is theatre. SAST tools cover the code paths they can analyze. DAST tools cover the URLs they can crawl. Both have blind spots:
- Logic bugs that require specific input combinations
- Authenticated paths that need complex setup
- Race conditions that depend on timing
Knowing what you didn’t test is as important as the findings from what you did.
Express coverage in meaningful terms for your application:
- Percentage of routes tested
- Percentage of authenticated workflows exercised
- Percentage of external integrations fuzzed
A coverage number without context is just a number. A coverage number tied to a specific list of untested surfaces is a roadmap.
5. Repeat Findings
How often do findings come back after being marked closed? This is the metric that reveals remediation quality. Findings that close and stay closed represent real risk reduction. Findings that close, reappear in the next test, and get closed again is a process failure. Either the fix was cosmetic, the root cause was elsewhere, or the underlying weakness was never addressed.
Track repeat rate by finding category. High repeat rates on injection findings usually point to a framework-level problem. Developers are patching individual instances instead of introducing a library that prevents the whole class of findings.
What Not to Measure (Or At Least Not To Reward)
Some metrics look useful and actively mislead you. Rewarding them distorts behaviour without improving security.
Raw finding count. Measuring findings as a pure number, without context, incentivises testers to produce high counts and encourages development teams to suppress low-value findings. The signal drowns in the noise.
Time since last test. Checking that tests ran on schedule is compliance hygiene, not security. A quarterly test that nobody acts on is worse than an annual test with serious follow-through.
Number of tools deployed. The security tools market will happily sell you a dashboard full of coloured lights. Accumulating tools creates the appearance of investment without the substance of risk reduction. What matters is whether the tools you have are producing findings that actually get remediated.
External audit pass/fail. Passing a SOC 2 or ISO 27001 audit is necessary for many businesses, not just a security metric. Organizations pass audits regularly while carrying serious unresolved vulnerabilities. The audit is a floor, not a ceiling.
Building a Measurement Programme
For mall businesses, the path to useful measurement is incremental. Start with a single spreadsheet that tracks every finding from every source. Keep the fields simple:
- Finding ID
- Date identified
- Severity
- Affected asset
- Remediation deadline
- Actual remediation date
- Root cause category
From that spreadsheet, the five metrics above fall out of basic pivot-table work.
After three months, you’ll have enough data to establish a baseline. After six months, trends become visible. After a year, you can have a real conversation about whether the programme is working and where to put the next investment.
Once findings start overwhelming the spreadsheet (usually within the first year for any business with an active development programme), there are good dedicated tools worth considering:
- Jira with a security-specific project
- GitHub Issues with security labels
- DefectDojo — an open-source vulnerability management platform purpose-built for exactly this
Using Metrics Without Breaking the Programme
The most common failure mode of security metrics: they become instruments of blame.
A team whose MTTR gets reported to leadership every month will optimise for MTTR. That might mean genuinely fixing things faster. It might also mean marking findings “false positive” or “risk accepted” to move them off the queue. Both produce the same number on the slide.
Two things guard against this.
First, review metrics as a learning exercise, not a performance review. The question is “what does this tell us about the system,” not “whose fault is this number.”
Second, never use metrics in isolation:
- MTTR alongside repeat rate is informative. MTTR falling while repeat rate rises should raise an eyebrow.
- Finding density alongside coverage is informative. Falling density in the context of rising coverage is genuine improvement.
NIST’s guidance on security metrics makes the same point: the purpose is improvement, and metrics that undermine trust in the data destroy their own usefulness. A metrics programme that leadership pays attention to and that engineering teams feel ownership over produces steady risk reduction. A metrics programme used as a weapon produces numbers that look good and a security posture that doesn’t.
The Bottom Line
The reason we maintain a security testing programme isn’t to produce reports. It’s to produce a codebase, infrastructure, and operational posture that’s measurably harder to attack this quarter than it was last quarter.
Metrics are how you know if that’s happening. Without them, every red-flagged inbox is a story you tell yourself about being serious. With them, you have evidence. That’s the only currency that matters when a board, a customer, or a regulator asks whether your security programme actually works.
Further Reading
Ready to apply this to your project?
Let's talk about your specific challenges.
Start the conversation →