Reference·23 May 2026·5 min read

Five things every security scanner gets wrong about git history

A “removed” secret in commit history is still a leaked secret. Anyone who can clone the repo can run git log -p and find it. Half the security scanners on the market either don't look at git history or look at the last commit only. The rest tell you the secret exists but not which commit introduced it, who committed it, or when, the three things you actually need to remediate.

Here are five things real git-history secret scanning has to get right, and what we do about each.

1. Scan the full history, not a shallow clone

Most scanners pull the repo with git clone --depth=1 because shallow clones are fast and the scanner is being billed by CI minutes. That covers the working tree on the default branch and nothing else. Every secret that's ever been committed and later deleted, which is to say, every leaked secret a competent developer noticed and tried to take back, is invisible to the scan.

What we do: full clone, no depth limit. Slower; finds the things that matter. We also partial-clone with a per-blob size limit so a repo with a few large binaries doesn't blow out the scanner's disk budget, but the commit graph itself is always complete.

2. Tell the developer which commit introduced the leak

“A private key was found in your repository” is a useless finding. “A private key was committed in tests/test_verify.py at line 252, in commit b59c341, by Tim Shipp on 2026-04-24, and is still present in history even though the file no longer contains it on main” is a finding that resolves into a fix.

The first version lets the developer say “I removed that, we're fine” and move on. The second version forces the only correct action: treat the key as compromised, rotate it, and purge it from history with git filter-repo.

What we do: every git-history finding carries the commit SHA, the author, the date, and the file path with line number. The recommendation section walks through the four-step remediation in order (rotate, audit, purge, prevent).

3. Don't scan node_modules / vendor / build artifacts

Naive gitleaks runs against a repo that's ever accidentally committed node_modules will find “leaked credentials” in every published npm package's test fixtures. There are tens of thousands of these. They're all false positives. They make the report unreadable.

What we do: the scan pipeline strips known vendor / build-artifact directories (node_modules, .next/cache, vendor, __pycache__, etc.) from the post-clone tree before secret-scanning runs. Anything legitimately committed in those directories is still flagged in history scanning, because that's a real leak even if the file was later removed from the working tree. We just don't re-scan them in the present-state pass.

4. Distinguish fixed secrets from rotated secrets

A developer who notices a committed secret usually deletes the file and commits the delete. Job done, in their head. From a security standpoint, deleting the file does literally nothing, the secret is still in the commit immediately before the deletion, still recoverable, still valid until rotated.

Some scanners only look at the working tree, miss this entirely, and report the repo clean. Others see the secret in history and report it without context, leaving the developer to wonder if they need to act, especially when the deletion happened months ago and the secret might already have been rotated.

What we do: every git-history finding makes clear that the leak is present in history, that history is recoverable by anyone with clone access, and that the only valid remediations are rotation of the secret and purging from history. The report explicitly does not suggest “remove from the working tree” as a fix, because it isn't one.

5. Handle public repos differently from private ones

A secret committed to a private repo with three contributors, deleted twenty minutes later, has a small but bounded blast radius. A secret committed to a public repo for any length of time should be considered fully exfiltrated, someone's scraper saw it, probably within the hour, and the secret is in a credential corpus somewhere.

Most scanners apply the same severity to both cases. They shouldn't.

What we do: the triage step knows whether the repo is public and adjusts the urgency language accordingly. A public-repo leak gets “treat as compromised, rotate immediately” phrasing without qualification. A private-repo leak gets the same recommendation but with explicit acknowledgement that the bound on exposure is the set of users with read access to the repo's history. The action is the same in both cases (rotate); the implied timeline is different.

The general principle

Secret scanning isn't hard. The interesting work is in turning a “match found” into a finding that a developer can actually act on without a five-step game of telephone. That requires knowing the commit, the author, the date, the file, the public/private status of the repo, and the right remediation order, and not burying any of it under a wall of false positives from node_modules.

If your current scanner is missing more than one of the five things above, the leaks it's reporting are the ones it bothered to find, not the ones that exist.

See it on your own repo

Pwnkemon's code scan does the full git-history pass on every run, with all five behaviours above. If your repo has ever accidentally had a secret committed, a Quick scan will find it inside thirty seconds and tell you exactly which commit to start with.

Wire it into CI via the GitHub Action, or run an on-demand scan from the dashboard. Free tier gets you one Quick scan per month, enough to find out whether you have a problem before you commit to a subscription.