How a 1-line CSS change took down our checkout (and replay caught it)

Annotated replay frames from a real incident. Five minutes from alert to root cause — without ever reproducing it locally.

·
incidentreplay

You check the dashboard on a Friday afternoon and the conversion graph has fallen off a cliff—down 40% in two hours. Your stomach drops. You open the error tracker, expecting a wall of stack traces. It is completely silent. No 500s. No Uncaught TypeError. The "Pay Now" button is right there in the DOM, styled correctly, exactly where it should be. By every signal your monitoring stack produces, the site is healthy. And yet nobody can check out.

This is the ghost bug: a visual or interaction-layer regression that renders your application useless while every log line insists everything is fine. Traditional error tracking is built to catch code that throws. It has nothing to say about code that runs perfectly and still locks your users out. Here is how a one-line CSS change took down our checkout, and how session replay took us from a silent dashboard to root cause in about five minutes—without ever reproducing the bug locally.

The Nightmare Scenario: The Green Dashboard

The deploy was routine. A small UI polish PR—loading states for the checkout form—merged Friday afternoon. Every automated test passed. The end-to-end suite, which clicks through a happy-path checkout, went green. Error rate held at zero before and after. There was nothing, anywhere, to suggest we had just broken our most important flow.

Why traditional error tracking stayed silent

Error trackers like Sentry or Bugsnag hook into window.onerror, unhandled promise rejections, and framework error boundaries. They are exceptional at catching exceptions. But our bug threw no exception. The JavaScript was sound—event handlers were attached, state was correct, the click handler was ready and waiting. The problem was that the click never reached it. A standard init looks like this and would have reported absolutely nothing:

Sentry.init({
  dsn: process.env.NEXT_PUBLIC_DSN,
  integrations: [Sentry.browserTracingIntegration()],
  tracesSampleRate: 0.1,
});
// All green. The button renders, the handler is bound,
// and nothing ever throws. The tracker has nothing to capture.

"It works on my machine"

The second trap was reproduction. On our dev machines, checkout worked flawlessly. The regression depended on a transient loading state combined with a specific render order that only reliably manifested under production timing. We could have spent the rest of the day trying to recreate the exact conditions and never hit it. Visual and timing bugs are precisely the class that "works on my machine" was invented to describe.

The Mystery of the Unclickable Button

With logs useless and local repro hopeless, we did what most teams do: we started poking at the live site.

The network tab is no help

Every request was 200 OK. The page assets loaded. The checkout API was healthy and responding in tens of milliseconds—to the few requests it received. That last part was the clue we almost missed: the API was healthy because almost no checkout submissions were reaching it. The failure was happening entirely client-side, before any network call could fire.

The rage-click signal

What actually pointed us at the problem wasn't something we went looking for—it was an alert that had already fired. Our replay tooling flagged a sharp spike in rage clicks on the checkout route: users clicking the same element repeatedly, fast, in frustration. Rage clicks are the canary for interaction bugs. They mean a human is trying to do something the UI is visibly inviting them to do, and nothing is happening. The spike started within minutes of the Friday deploy. We had our timestamp and our location before a single support ticket came in.

Opening the Replay: What the User Actually Saw

We pulled up one of the flagged sessions and pressed play. No reproduction, no questionnaire, no "what browser are you on?" email thread—just a recording of a real user's real attempt to give us money.

Annotating the frames

The replay was almost painful to watch. The user filled in their card details cleanly. They moved the cursor to "Pay Now." They clicked. Nothing. No hover state lit up—the button never even acknowledged the pointer. They clicked again. And again. Then they moved the mouse around the button as if testing whether anything on the page was alive, and finally closed the tab. The button looked perfect and behaved like a painting of a button.

Inspecting the DOM at the moment of failure

Because the replay captures the serialized DOM, not just a video, we could inspect computed styles at the exact failing timestamp. The button itself was fine. But sitting on top of the entire form was an "invisible wall"—a transparent overlay div, the loading-state element from the new PR, stretched to full height. It had zero opacity and no background, so it was completely invisible, but it was intercepting every pointer event before it could reach anything underneath. The cursor was hovering over the overlay the whole time, not the button.

Finding the One-Line Culprit

Once we could see the overlay, the code review took thirty seconds.

It is worth pausing on why a bug like this is so financially brutal. The Baymard Institute's checkout-usability research consistently finds that technical friction—not pricing, not shipping—is one of the fastest routes to cart abandonment. And friction doesn't need to be an error message. A user who clicks "Pay Now" and gets no response doesn't file a bug report or email support; they assume your site is broken or sketchy and they leave, usually within seconds. There is no funnel step that captures "clicked a dead button and gave up." The conversion just quietly evaporates, which is exactly why a silent interaction bug can run for hours before anyone connects it to a revenue number.

The accidental inheritance

The loading overlay was supposed to appear during the brief window while the checkout form initialized, then unmount. A refactor in the PR moved the toggle logic, and under production timing the overlay rendered but its dismissal never fired—leaving a transparent, full-viewport element permanently layered over the form. The element itself carried the offending property:

/* The loading overlay -- intended to be temporary */
.checkout-loading-overlay {
  position: absolute;
  inset: 0;
  z-index: 50;
  background: transparent;   /* invisible... */
  pointer-events: auto;      /* ...but it still eats every click */
}

/* The "fix" was deleting one line. The real fix was
   ensuring the overlay actually unmounts. As a belt-and-suspenders
   guard we also added: */
.checkout-loading-overlay[data-state="idle"] {
  pointer-events: none;      /* let clicks pass through if it lingers */
}

Why it never threw

This is the crux of why error tracking was blind. pointer-events is a rendering and hit-testing concern, handled entirely by the browser's compositor. From JavaScript's perspective nothing was wrong—the handler existed, the state was valid, no code path ever ran that could throw. The interaction layer was physically blocked at a level your application code never even gets to observe. There is no exception to catch because there is no exception. There is only a user clicking into the void.

Five Minutes to Root Cause vs. Five Hours of Reproduction

The contrast with the traditional workflow is stark. Here is roughly how each path would have gone:

Traditional "log-and-guess"          Session replay workflow
---------------------------------    ----------------------------------
Notice conversion drop (hours)       Rage-click alert fires (minutes)
Check logs -> nothing                Open a flagged replay
Email users for repro steps          Watch the dead clicks happen
Guess at browser/OS combos           Inspect DOM at failure timestamp
Try and fail to reproduce            See the invisible overlay
~5 hours, maybe                       ~5 minutes

We skipped the entire "what browser and OS are you on?" chain because the replay already carried the viewport dimensions, the device, and the exact sequence of state transitions that produced the overlay. We saw the failure instead of reconstructing it from secondhand descriptions.

The Cost of Sampling: Why 100% Capture Matters

Here is the part that gets glossed over. Many replay products sample—they record 1%, 5%, maybe 10% of sessions to control cost. For this bug, sampling would very likely have been fatal.

The bug was conditional

The overlay only lingered under specific production timing, affecting a subset of sessions. If you are recording 5% of traffic, you are betting that your tiny sample happens to contain enough affected checkout sessions to surface a rage-click pattern in time. On a high-stakes, lower-volume flow like checkout, that bet often loses. The one session that makes the bug obvious is statistically likely to be one you threw away.

A visual audit trail for every transaction

With 100% capture, every single checkout attempt—successful or abandoned—has a replay. There is no "I hope we caught it." The bug was already recorded before we knew it existed. On a flow where catching the problem two hours sooner is worth thousands of dollars in recovered conversions, full-fidelity capture pays for itself in a single incident. You can estimate what that capture actually costs to store with our replay storage calculator; the answer is almost always less than one missed checkout-down afternoon.

Preventing the Next Silent Failure

We made two durable changes. First, every checkout-route alert now ships with a direct replay_url in its Slack payload, so the first responder is one click from watching the failure instead of starting an investigation:

{
  "alert": "rage_clicks.spike",
  "route": "/checkout",
  "count_5m": 87,
  "release": "web@2026.06.03",
  "replay_url": "https://app.glitchreplay.com/replays/9f2a..."
}

Second, we added "visual health" alerts based on dead clicks and rage clicks per route, independent of the error rate entirely—because we now know the most dangerous checkout bugs are the ones that never throw. One more thing worth doing before you turn on full checkout capture: make sure card fields are masked. Our guide to masking PII in session replay and the PII docs cover exactly how to record the interaction without recording the card number.

This is why GlitchReplay captures 100% of sessions at a flat rate behind a Sentry-compatible SDK: the bugs that cost you the most are usually the ones your logs can't see, and you cannot debug a recording you decided not to keep. Stop guessing why users drop off. Watch what they actually saw.

Stop watching your error bill spike.

GlitchReplay is Sentry-SDK compatible, includes session replay and security signals, and never charges per event. Free to start, five minutes to first event.