The 7 PII fields your error tracker is leaking right now

A self-audit checklist with the regexes we use to find leaks in our own ingest pipeline. Run it against your last 1,000 events.

·
privacypii

In 2018, Twitter asked all 330 million of its users to change their passwords. The cause wasn't a breach in the usual sense—no attacker got in. An internal log had been quietly capturing passwords in plaintext before they were hashed. The data never left Twitter's systems, but it didn't matter. The exposure was the log itself.

Modern error tracking is that same risk, multiplied. The "log" is no longer a line of text—it's a rich, multi-dimensional JSON payload stuffed with stack traces, breadcrumbs, request headers, and replay DOM snapshots. If you haven't audited your scrubbing rules in the last six months, you're not just tracking bugs. You're accumulating a high-risk database of your users' most sensitive secrets, and you probably can't say with confidence what's in it. Here are the seven fields most likely leaking right now, plus a self-audit you can run today.

The "Default Scrubbing" Fallacy

Every SDK ships with default data-scrubbing rules, and that's exactly the problem: teams trust them like a bulletproof shield. They aren't. Defaults match a fixed list of well-known key names and nothing else.

"Password" isn't the only keyword

Default rules look for password. They miss pwd, pass, passwd, secret, token, api_key, and the dozen other names your codebase actually uses. A beforeSend hook that strips Authorization sails right past a field your team named auth_token because the string match never fires.

Context creep

Even a clean setup degrades. Every new feature that calls setContext or setExtra adds metadata to the global scope, and six months later nobody remembers that the billing module started attaching the full customer object to every event. Scrubbing rules are a snapshot; your data is a moving target.

Field 1: The "Lazy" User Object

The most common leak by far: passing the entire database user row into the SDK context instead of just an ID. Sentry.setUser(user) where user is the ORM model means name, email, hashed password, internal role, and IP all ship with every event.

The fix is the user_id-vs-email discipline—send the opaque identifier you need to correlate, never the human-readable fields. To find existing leaks, grep your event export for email patterns:

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

Field 2: Query String Residuals

URLs get logged in breadcrumbs, stack traces, and the Referer header—and tokens love to hide in query parameters. Password-reset links and magic links carry their secret right in the URL, and the SDK captures the whole thing. UTM parameters occasionally mirror PII too, when a marketing integration shoves an email into a tracking param.

Catch token-like parameters in URLs with:

(?:\?|&)(?:token|auth|session|key)=([^&\s]+)

Field 3: Authorization Header Variants

The standard Authorization header is usually scrubbed by default. Custom and malformed variants are not. X-Api-Key, X-Auth-Token, Cookie, and a Bearer token a proxy injected into a non-standard header all slip through.

The "proxy leak" is especially sneaky: an upstream gateway injects the authenticated user's identity into a header your SDK has never heard of, and it gets captured verbatim. If you grep a typical sample of 1,000 events for the string Bearer, you'll usually find several outside the standard Authorization key—each one a live credential sitting in your dashboard.

Field 4: Local File Paths in Stack Traces

Server-side and build-time stack traces embed absolute file paths, and those paths leak identity and infrastructure. A trace from a developer's machine reads /Users/jsmith/projects/app/src/auth.ts—there's the engineer's username. Production traces leak container IDs, internal mount points, and network paths that map out your deployment topology for anyone reading the dashboard.

Detect home-directory paths with:

\/(?:Users|home)\/([^\/\s]+)        // Unix
[A-Z]:\\Users\\([^\\]+)              // Windows

Field 5: DOM Attributes in Breadcrumbs and Replay

For anything with click-tracking or session replay, the DOM tree itself becomes the leak. Two failure modes dominate.

First, aria-label and title attributes that mirror input values—a breadcrumb records the element a user clicked, and that element's accessible label happens to contain their name or account number. Second, masking failures: a "Search" bar or a "coupon code" field that a user actually types their phone number or SSN into, but which was never marked type="password" and so never got masked. The replay captures every keystroke in plaintext. This is why mask-by-default beats mask-the-known-fields—see our deep dive on GDPR error tracking blind spots for the full argument.

Field 6: JSON Request Bodies

An error during a form submission frequently captures the entire request body. The hard part is depth: scrubbing that only matches top-level keys misses order.billing_address.street sitting three levels down, and it struggles entirely with arrays of objects like a passengers list where each entry holds a passport number.

Run this self-audit checklist against any captured body—these keys should never appear in cleartext: cc_number, cvv, ssn, dob, passport, lat, long, billing_address. The fix is recursive scrubbing that walks nested objects and arrays, applied as a backstop on the server regardless of what the client did.

Field 7: Environment Variables

Server-side SDKs sometimes grab the environment to enrich debugging context, and process.env (Node) or os.environ (Python) is a goldmine of secrets. One setContext("env", process.env) and your Stripe secret key, AWS credentials, SendGrid token, and database URL are all in the global context of every error.

High-entropy secrets are catchable by shape even when you don't know the key name. A pattern like [a-zA-Z0-9]{32,64} flags long opaque strings worth reviewing—expect false positives on hashes and IDs, but it surfaces the API keys hiding in your event stream.

The 1,000-Event Audit: A Step-by-Step Guide

Enough theory. Here is how to actually find your leaks this afternoon.

Step 1: Export your data

Pull your last 1,000 events as JSON or CSV from your error tracker's export or API. You want raw payloads, not the rendered dashboard view.

Step 2: Run the "grep of shame"

#!/usr/bin/env bash
# Scan an error export for the seven leak classes.
FILE="events.json"

echo "== Emails =="
grep -oiE '[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}' "$FILE" | sort -u

echo "== Tokens in URLs =="
grep -oiE '(token|auth|session|key)=[^&"]+' "$FILE" | sort -u

echo "== Bearer credentials =="
grep -oiE 'Bearer [a-z0-9._-]+' "$FILE" | sort -u

echo "== Home directory paths =="
grep -oiE '/(Users|home)/[^/" ]+' "$FILE" | sort -u

echo "== High-entropy secrets =="
grep -oiE '[a-z0-9]{32,64}' "$FILE" | sort -u | head -50

Step 3: Calculate your leak rate

Count how many of the 1,000 events contained at least one match, divide by 1,000, and you have your leak rate. Anything above zero is a finding. A rate above 5% means scrubbing is effectively not running. Re-run the script monthly to confirm the number trends toward zero.

Why GlitchReplay Favors "Scrub-at-Source"

Our philosophy is that PII should never be persisted in the first place, not redacted after the fact. GlitchReplay runs scrubbing on Cloudflare Workers at the ingest edge, so the seven leak classes above are caught at the network boundary before anything touches storage. Our Sentry-compatible SDKs ship with stricter default PII rules than the originals, and you can preview exactly what a rule does against your own payloads with the PII scrubbing tool. The full rule model is documented at /docs/pii, and our broader security posture at /docs/security.

Default scrubbing is a starting point, not a guarantee. Run the grep of shame against your last thousand events, fix what it surfaces, and put a recursive allowlist backstop on your ingest. The goal isn't to feel safe—it's to be able to prove your dashboard isn't a breach waiting to happen.

Stop watching your error bill spike.

GlitchReplay is Sentry-SDK compatible, includes session replay and security signals, and never charges per event. Free to start, five minutes to first event.