PII scrubbing at ingest: why client-side isn't enough
Three real ways client-side scrubbing leaks data — and the threat model that makes server-side a hard requirement.
A developer pushes a "temporary" debug statement that logs the entire user object to the console. Your error SDK dutifully captures it. Your client-side regex is looking for email and password—and it catches those—but it sails right past the nested billing_address.phone_number and the ssn field that got added in last week's sprint. By the time anyone notices the regex was incomplete, 10,000 plaintext records are sitting in your error-tracking database. Your client-side "protection" didn't stop a single one of them.
This is the core argument of this post: client-side PII scrubbing is a best-effort performance optimization, not a security control. The client environment is inherently untrusted and the SDK's execution is brittle, which means real data privacy requires a zero-trust posture—scrubbing at the ingest edge, server-side, before sensitive data ever touches storage. Let's walk through exactly why the client can't be trusted, and what to do instead.
The Best-Effort Trap of Client-Side Scrubbing
Almost every error SDK offers a hook—beforeSend, scrubbers, denylists—that runs in the browser and strips sensitive fields before the event is transmitted. It is popular for good reasons: it reduces bandwidth, it keeps obvious secrets off the wire, and it's easy to wire up.
How SDK-based scrubbing works
You register a function that receives the event object, you delete or mask the fields you don't want, and you return it. The SDK serializes whatever you return and POSTs it. Simple, and useful as a first pass.
The trusted-client fallacy
The fatal assumption underneath all of this is that the browser is a reliable place to enforce a privacy guarantee. It is not. The browser is an adversarial, unpredictable execution environment you do not control: code runs in an order you didn't anticipate, other scripts mutate the page, the SDK can be partially initialized or crash, and a sufficiently motivated user can disable your scrubber entirely. Asking the client to be the last line of defense for sensitive data is asking the least trustworthy participant in the system to be the most responsible one. Even Sentry's own documentation concedes that client-side scrubbing is not a 100% guarantee—and the rest of this post is three concrete reasons why.
Failure Mode 1: Race Conditions and SDK Crashes
The first way client scrubbing fails is the most ironic: the protective logic doesn't run, because it hasn't loaded yet or it broke.
Partial initialization and early errors
Your scrubber only protects events captured after the SDK is fully initialized and the beforeSend hook is registered. But errors that happen during startup—in a config block, during early bootstrapping, before init completes—are captured by global handlers that may transmit a raw event before your filter ever exists:
// This throws while building the SDK config, BEFORE beforeSend
// is registered. The global handler may capture and send it raw.
const config = {
dsn: DSN,
beforeSend: scrubPII,
user: buildUserContext(), // <-- throws: includes full user object,
}; // captured before scrubPII is live
Sentry.init(config);Global handlers bypassing SDK logic
Native window.onerror and unhandledrejection handlers fire independently of your SDK's lifecycle. If anything wires capture into those paths before—or instead of—your scrubber, the data goes out unfiltered. The protective code is exactly the code most likely to be skipped during the chaotic early moments of page load, which is precisely when buggy, PII-laden errors love to happen.
Failure Mode 2: Session Replay DOM Serialization
If scrubbing a JSON error payload is hard, scrubbing a session replay is an order of magnitude harder, because a replay isn't text—it's a serialized, continuously-updating snapshot of your entire DOM.
Why data-mask attributes are a maintenance nightmare
The common client-side approach is to annotate sensitive elements with masking attributes—data-mask on the credit-card input, the SSN field, and so on. This works right up until someone adds a new field and forgets the attribute, or a refactor renames a component, or a designer drops in a new form. Masking-by-annotation means your privacy guarantee degrades a little with every commit, and nobody notices until something leaks. It is allowlist security maintained by hand, forever.
The hidden PII
Worse, sensitive data hides in places you'd never think to annotate: a title attribute, an aria-label, a tooltip, or markup injected by a third-party widget you don't control. Imagine a component where an address-autocomplete library writes the user's full address into a title attribute for a hover tooltip:
// You masked the <input>. You did not mask this, because a
// third-party lib injected it after render -- into an attribute
// your client scrubber never inspects.
<li
role="option"
title="Jane Doe, 14 Elm St, Apt 3, 555-0142" // <-- full PII, unmasked
>
14 Elm St
</li>Your developer-defined scrubber isn't looking there, so the replay captures it. We cover the replay-specific side of this in depth in masking PII in session replay, with reference material in the replay docs.
Failure Mode 3: The Breadcrumb Leak
Even when the main exception payload is spotless, the breadcrumbs—the trail of actions leading up to the error—are frequently where the real secrets live.
XHR/fetch bodies and URL parameters
SDKs automatically record network activity as breadcrumbs. A login request's body, an API call with a token in the query string, a redirect URL carrying a session parameter—all of it gets captured as context. Developers carefully scrub the exception object and forget that the breadcrumb trail attached to it is a second, parallel payload with its own copy of the sensitive data.
Console logs as breadcrumbs
Many SDKs capture console.log output as breadcrumbs by default. That "temporary" debug log of the full user object from our opening scenario? It doesn't just leak through the console—it gets serialized into a breadcrumb and shipped with the very next error. Here is a payload with a perfectly clean exception and a JWT sitting in plain sight in the breadcrumbs:
{
"exception": { "values": [{ "type": "TypeError", "value": "..." }] },
"breadcrumbs": [
{
"category": "fetch",
"data": {
"url": "/api/account",
"request_headers": {
"authorization": "Bearer eyJhbGciOiJIUzI1NiIsIn..." // leaked
}
}
},
{ "category": "console", "message": "user = {ssn: '...'}" } // leaked
]
}Three separate channels—exception, breadcrumbs, replay—each needs independent scrubbing, and a client-side approach has to get all three right, every time, in every code path. You can check your own payloads for exactly these leaks with our free PII scanner.
The Threat Model: When "Accidental" Becomes "Liability"
Step back from the bugs for a moment, because the deeper problem isn't any single failure mode—it's that the threat model assumes good faith from a component that can't provide it.
The malicious client
Client scrubbing implicitly assumes everyone wants the data scrubbed. But a user can intentionally feed PII into your forms and error paths—and a compromised browser extension or injected script can deliberately push sensitive data into captured events. If your only defense runs in that same browser, an attacker who controls the browser controls your privacy guarantee. You cannot enforce a policy inside an environment the adversary owns.
The auditor's perspective
"We tried to scrub it in JavaScript" does not survive a serious Data Processing Agreement audit. GDPR Article 32 requires a process for regularly testing the effectiveness of your technical measures; a brittle client-side regex that silently degrades with every deploy is the opposite of a testable, effective control. PCI-DSS requires cardholder data to be unreadable everywhere it's stored—including logs—and an auditor will not accept "best effort in the browser" as meeting that bar. For regulated data the requirement is categorical, not aspirational.
Data minimization vs. redaction
The strongest privacy posture isn't redacting sensitive data after you receive it—it's never storing it in the first place. That distinction (minimization vs. redaction) is exactly what regulators look for, and it can only be enforced at the point where data enters your control: the ingest server.
Ingest-Side Scrubbing: The Zero-Trust Model
The fix is to move the authoritative scrubbing to the one place you actually control: the ingest endpoint, before anything is written to disk.
A catch-all safety net
Server-side scrubbing treats every inbound event as untrusted and applies a single set of rules to all of it—exception, breadcrumbs, replay DOM, every field—regardless of which SDK, which environment, or which version produced it. It cannot be bypassed by a partially-initialized client, a forgotten data-mask, or a malicious extension, because it runs after the data leaves the untrusted environment and before it lands anywhere durable:
# Ingest rules: applied to 100% of events, every project, unbypassable.
rules:
- match: { key: /(password|secret|token|ssn|cvv|card)/i }
action: redact
- match: { value: CREDIT_CARD } # handles 4111-1111... and 4111 1111...
action: redact
- match: { value: JWT }
action: redact
- scope: [exception, breadcrumbs, replay, request, extra]Note the credit-card rule matches the pattern, not a fixed format—a naive client regex looking for dash-separated numbers is trivially evaded by a user who types spaces instead. Pattern matching at the ingest layer closes that gap.
The Cloudflare advantage
Running this at the edge means the scrub happens in the same network hop that receives the event—sensitive fields are stripped before the payload travels any further into your infrastructure, with effectively zero added latency. The data is cleaned at the door, not after it has wandered the building.
Implementing a Layered Defense
None of this means you should throw away client-side scrubbing. The right model is defense in depth—the Swiss-cheese model, where each layer catches what the others miss.
Keep client-side scrubbing too
Client scrubbing still earns its place as a bandwidth and obvious-case optimization: it reduces payload size and stops the most blatant leaks from ever crossing the network. Just stop treating it as the guarantee. It is the first, leaky slice of cheese—useful, but never sufficient alone.
Centralized rules across all projects
Define your PII rules once, server-side, and apply them to every project automatically. New service, new team, new app—all inherit the same unbypassable baseline the moment they start sending events. No per-project regex to maintain, no drift, no forgotten field.
Audit logs for compliance
Crucially, server-side rules are auditable. You can show legal and your auditors exactly which rules are active, when they changed, and prove that the control is enforced on 100% of ingestion—the kind of testable evidence GDPR Article 32 actually asks for. "Here is the rule, here is its change history, here is proof it ran on every event" is a defensible answer; "we have a regex in our frontend" is not.
This is why GlitchReplay applies PII scrubbing at ingest, on the edge, to every event and every replay by default—not as an opt-in client convenience but as a mandatory, centralized, auditable control. You can still scrub on the client to save bandwidth, but the guarantee lives where it belongs: on a server you control, before the data is ever stored. See the PII docs for the rule reference, audit your current setup with the free PII scanner, and read how to test your error tracker for PII leaks before you trust any tool with production data. Stop relying on best-effort privacy. Make the guarantee unbypassable.
GlitchReplay is Sentry-SDK compatible, includes session replay and security signals, and never charges per event. Free to start, five minutes to first event.