Reachability: why most 'high' CVEs in your scanner are noise
Run npm audit on a small Node repo and you'll get something like this:
7 vulnerabilities (2 moderate, 5 high)
To address all issues, run:
npm audit fix --forceMost teams glance at “5 high”, sigh, and either bump versions blindly or close the tab. Both responses are understandable. Neither is right. The honest answer to “how many of those 5 high findings actually matter” is almost always fewer than five, often zero, and the path to knowing which is harder than running --force.
This post walks through a real scan of a small public repo, what the raw scanner output said, what we did to it, and how the final report turned 8 raw findings into 5 with severities that actually reflect risk.
The raw output
The repo is a small Cloudflare Worker project — a few hundred lines of TypeScript, a package-lock.json, a couple of config files. Running the same scanners we ship in production (osv-scanner, npm audit, semgrep, trivy, gitleaks), the raw output came back with:
osv-scanner: no findingsnpm audit: 7 advisories, all marked highsemgrep: no findingstrivy: no findingsgitleaks: 1 finding — a private key committed in a test file
So: 8 raw findings, of which 7 are dependency advisories marked high. Submitted to a developer at face value, that looks like “your repo has 7 critical security issues and is on fire.”
It isn't.
The 7 high “dependency” findings, examined
Every one of those 7 advisories was for a package in the dev-tooling chain — esbuild, undici, ws, @esbuild-plugins/node-globals-polyfill, @esbuild-plugins/node-modules-polyfill, miniflare, wrangler. All of them transitive dependencies of the local Cloudflare Workers development simulator. None of them imported by the repo's actual source code. None of them reachable from any code path that runs in production.
The flagship one is a real vulnerability: the esbuild development server accepts cross-origin requests, which means any website you visit while a local dev server is running could read its responses. Genuine bug. Worth patching. Not a production risk — the dev server doesn't exist in production, the package isn't imported from production code, and the worst-case exploit is “an attacker who already has you browsing their site can read what you're working on locally.”
Calling that a high in your PR comment is wrong. It makes the seven things that are actually fine look as urgent as the one thing that isn't.
The 1 finding that actually matters
The gitleaks hit was a real private key. Committed in a test file. Removed in a later commit. Still recoverable from git history by anyone with read access to the repo. If the key was for a production system, it's burned.
That's the finding the developer needs to see in big letters. Rotate the key. Purge it from history with git filter-repo. Audit access logs.
What the report did with all this
The Pwnkemon report rated the leaked key as high and put it at the top of the findings list. The seven dependency advisories were downgraded to low (for the three with public CVE IDs) and rolled into one info-level summary entry (for the four with no CVE assigned). Each downgrade carries the same one-line rationale: “transitive dev tooling, no code path from production source, downgraded from high to low for hygiene tracking.”
The summary table the developer sees first:
| Severity | Count |
|---|---|
| Critical | 0 |
| High | 1 |
| Medium | 0 |
| Low | 3 |
| Info | 1 |
That's actionable. 1 high to fix today, 4 lesser items to triage when you have time. Compare to the raw scanner output (“7 high vulnerabilities”) and you can see why the difference matters.
How the triage decides
It's not magic and it's not heuristic-only. Pwnkemon's triage step is an LLM (currently Claude) with access to:
- The raw scanner output for every finding (severity, CVE ID, package, path).
- The repo's full source tree, post-clone.
- A reachability classification for each dependency CVE: does the repo's own code (not its
node_modules) actuallyimportorrequirethe affected package, directly or via a chain of first-party modules? If not, it's flagged as unreachable. - The package's position in the dependency tree (direct, transitive, dev-only, prod).
The LLM's job is to merge those inputs and produce a severity that reflects exploitability in this repo's context, not the worst-case severity the advisory assigned in the abstract. An advisory that's critical for everyone who actually uses the affected function is still going to land critical in our report. The same advisory in a package nothing in your repo touches won't.
Why most scanners don't do this
Reachability analysis is hard, especially for dynamic languages where imports can be string concatenations and method calls can be late-bound. The conservative answer — assume every dep is reachable — is cheap and defensible (“we told you”). The accurate answer is expensive and occasionally wrong (you might miss a late-bound import).
We've made the call that occasional false negatives in reachability classification are strictly preferable to drowning every report in transitive-dep noise. The recovery path for “a real critical was downgraded to low” is bad. The recovery path for “a developer ignored 50 fake highs and missed the one real one” is also bad, and happens much more often.
The compromise we've landed on: when reachability is ambiguous (dynamic import, plugin loader, ORM-style late binding), the finding is marked unknown rather than either reachable or unreachable, severity is kept at the upstream rating, and the report says so. That way the developer can't miss the cases where we genuinely don't know.
If you want to see the same scan on your repo
Two ways:
- Once-off: sign in, verify your repo via GitHub, and run a Quick scan from the dashboard. Free tier gets you 5 credits a month — enough for one Quick scan — with the same report quality you'd get on a paid plan, minus the “not for compliance use” watermark.
- On every PR: drop the Pwnkemon Scan GitHub Action into your workflow. Findings post as a PR comment; builds fail on high+ severity. The example above was generated by exactly this Action on a test repo.
If the report does the same thing for your repo — downgrades transitive-dep noise, surfaces the one thing worth your attention — you'll feel the difference inside the first scan. That's the whole pitch.