A 200ms TTFB regression that cost us $40k/month
Postmortem of a quietly-rolled-out CDN config change, how RUM caught it, and the alert rule we now run on every deploy.
The deployment was a nothing burger. A minor update to a header-forwarding rule in our CDN configuration—the kind of change that gets a one-line PR description and a thumbs-up emoji in code review. All unit tests passed. The green light stayed green. Server-side response times, measured at the origin, stayed perfectly flat at around 35ms. We shipped it on a Wednesday afternoon and forgot about it.
Forty-eight hours later, our Growth Lead pinged the incident channel with a screenshot that made everyone's stomach drop: checkout conversion had quietly dipped by 4%, and paid-search ROAS was cratering. There was no outage. No 5xx spike. No error in any log. Just a slow, expensive bleed that turned out to be costing us roughly $1,300 every single day. This is the story of how a 200ms delay at the edge slipped past our entire observability stack, and how Real User Monitoring finally found the invisible bottleneck.
The Deployment That Didn't "Break" Anything
The change itself was reasonable. We were moving from a simple static header check to a more complex piece of edge logic that needed to inspect a request header before deciding how to route the response. The intent was good. The implementation compiled, deployed, and served traffic without throwing a single exception. By every metric we were actively watching, the system was healthy.
That is exactly the problem with performance regressions: they are not errors. An error trips an alarm because it is binary—the request either succeeded or it threw. A latency regression is a gradient. The request still returns a 200. The page still loads. The user still sees content. Nothing "breaks" in a way that any boolean health check can detect. The damage is purely in the time domain, and time is the one dimension most teams instrument the least.
Why synthetic monitoring missed the signal
We ran synthetic checks every minute from three datacenters. They all came back green. The reason is structural: synthetic monitors hit your site from a fixed set of well-provisioned cloud regions, usually sitting on a fat network pipe a few milliseconds from a CDN edge node. They are great at catching a hard outage and useless at catching a regression that only manifests under the messy conditions of real networks, real devices, and real geographic distribution.
Our regression had a geographic shape. It only added the redundant round-trip for requests that missed a specific edge cache, and cache-miss rates vary wildly by region. Our synthetic probes happened to run in regions with warm caches. They never saw the slow path.
The fallacy of "server-side only" metrics
Our Prometheus dashboard told a beautiful lie. Origin response time: flat. CPU: flat. Database query latency: flat. The dashboard was measuring the time from when a request arrived at our origin to when the origin sent a response. But the regression lived entirely upstream of the origin, in the extra round-trip the edge was now forced to make. Server-side metrics are blind to everything that happens between the user's browser and your application's front door, and for a globally distributed app that gap is where most of the latency actually lives.
Connecting 200ms to $40,000
The instinct when you hear "200ms" is to shrug. Two-tenths of a second is below the threshold most people consciously perceive. But conversion does not care what users consciously perceive. It responds to friction, and Time to First Byte (TTFB) is the very first friction a user encounters.
The psychology of "instant": why TTFB is the trust metric
TTFB is the moment the browser stops staring at a blank tab and starts receiving bytes. Everything downstream—parsing, rendering, Largest Contentful Paint—is gated on it. A regression in TTFB does not just delay the page by 200ms; it pushes back every subsequent milestone by 200ms, and it does so during the exact window when a new visitor is deciding whether your site feels fast and trustworthy or slow and broken. If you want the deeper relationship between paint timing and revenue, we walk through it in how INP, LCP, and CLS map to conversion.
Calculating the latency tax
The industry rule of thumb, repeated across the Akamai and Google research and corroborated by WPO Stats case studies, is roughly "100ms of added latency costs about 1% of conversion." It is not a law of physics, but it is a defensible planning number. Apply it to our numbers and the math stops being abstract:
Monthly revenue through checkout: ~$1,000,000
Added TTFB latency: 200ms
Estimated conversion impact: ~2% (200ms / 100ms * 1%, blended)
applied to ~30% of traffic (geo-affected)
Effective conversion loss: ~2% on affected cohort
Daily revenue: ~$33,000
Daily loss: ~$1,300
Monthly loss: ~$40,000Forty thousand dollars a month, from a one-line change that passed every test we had. The regression was not in the code's correctness. It was in the code's cost.
The Anatomy of a CDN Misconfiguration
When we finally pulled apart what happened at the edge, the mechanism was almost embarrassingly simple. Our new header-inspection logic ran inside a Worker, and the way it was written caused a subtle but catastrophic change in caching behavior.
How a header check accidentally forced a cache miss
The original logic relied on edge caching with cacheEverything set, so most requests were served straight from the edge with no trip to origin at all. The new logic, in the course of reading a request header, constructed a fresh Request object and passed it to fetch without preserving the cache directives. The result: every request that hit the new path was treated as uncacheable and revalidated against the origin, adding a full round-trip.
// The regression, simplified
export default {
async fetch(request, env, ctx) {
const variant = request.headers.get("x-variant") ?? "default";
// BUG: building a new Request drops the cf cache settings,
// so cacheEverything no longer applies and every request
// revalidates against origin -- one extra round-trip each.
const upstream = new Request(request.url, {
method: request.method,
headers: request.headers,
});
return fetch(upstream); // cache: bypassed
}
};The fix was to thread the cache options through explicitly:
return fetch(request, {
cf: { cacheEverything: true, cacheTtl: 3600 },
});One line of intent, an unintended side effect on caching, and a 200ms tax on every cache-miss request. If you want a deeper tour of how edge logic can quietly sabotage performance, our guide to error tracking on Cloudflare Workers covers the isolate model that makes these mistakes so easy to make.
Visualizing the Damage with RUM
What finally exposed the regression was not a log line. It was Real User Monitoring—TTFB measured in the actual browsers of actual users, tagged with the deploy that was live when each measurement was taken.
P95 vs. P50: spotting the long tail
The median (P50) TTFB barely moved, because most users in cache-warm regions were unaffected. That is why averages and medians are dangerous for performance: they hide the cohort that is actually suffering. The P95, however, jumped from around 180ms to nearly 400ms the moment the deploy went live. The long tail is where regressions hide, and the long tail is where your most marginal—and most price-sensitive—conversions live.
Geographic outliers: why only 30% of users felt it
Segmenting the RUM data by country made the shape obvious. Users routed through edge locations with cold caches saw the full 200ms penalty; users in our high-traffic core regions saw almost nothing. Roughly 30% of sessions carried the regression. That is precisely the kind of partial, geographically uneven signal that synthetic monitoring and origin metrics will never surface, and it is why the chart that finally named the culprit was a TTFB-over-time line with a vertical marker on "Deploy #402." The instant the marker appeared, the P95 line stepped up. There was nothing left to debate.
Building the Performance Gate
Finding the regression after 48 hours is a failure, not a victory. The real fix was making it impossible for a latency regression to live that long again, without grinding the deploy cadence to a halt.
Deploy-specific performance alerts
We added an alert that compares the post-deploy P95 TTFB against the trailing 7-day baseline, scoped to the new release tag. If the new release's P95 exceeds the baseline by more than 15% across a meaningful sample, it fires—automatically, tied to the release, no human required to remember to look. The rule looks roughly like this:
{
"metric": "measurements.ttfb",
"aggregate": "p95",
"scope": { "release": "current" },
"comparison": {
"baseline": "trailing_7d",
"threshold_pct": 15,
"min_samples": 500
},
"notify": ["#perf-alerts", "pagerduty:web-platform"]
}Wiring it into CI and Slack
The alert posts directly into the channel that owns the deploy, with the offending release tag and a link to the segmented RUM chart. Because the comparison is relative to a rolling baseline rather than a static "good" number, it adapts as the site changes and does not nag us about absolute thresholds that no longer apply. Pairing it with an error budget view lets us decide, deliberately, how much performance regression we are willing to spend on velocity—instead of finding out 48 hours and $2,600 later. The free Web Vitals checker is a good way to sanity-check any URL's field data before you even set the alert up.
Why Traditional Pricing Kills Performance Culture
Here is the uncomfortable part. The reason this regression was invisible is not only technical—it is economic. Most teams running per-event-priced RUM heavily sample their data, recording only 1–10% of sessions, because full-fidelity capture would blow their monitoring budget.
The sampling trap
A 30%-of-traffic regression sounds like it would survive sampling, and it would. But the precise signal we needed—the P95 step, segmented by country, scoped to a single release—requires enough samples within each segment to be statistically meaningful. Slice a 5% sample by country, by device, and by release, and the segments that actually matter collapse to a handful of data points each. The regression hides in the noise of an undersampled tail. We had, in fact, turned sampling up after a previous cost scare, which is exactly the wrong reflex.
Full-fidelity debugging needs flat-rate economics
This is why flat-rate, full-fidelity monitoring matters beyond the obvious cost savings. When you are not punished per event, you capture every session, and the P95 of a regional cohort is not a statistical guess—it is a measurement. You can segment by anything, after the fact, without having pre-decided what to keep.
That is the model we built GlitchReplay on. It speaks the same Sentry-compatible SDK your app already uses, captures RUM and session replay at a flat rate with no sampling pressure, and lets you tie a TTFB regression to the exact deploy that caused it—and then jump straight into a replay of a slow session to see what the user actually experienced. Stop flying blind on your edge deploys. The next 200ms regression should cost you a five-minute investigation, not $40,000.
GlitchReplay is Sentry-SDK compatible, includes session replay and security signals, and never charges per event. Free to start, five minutes to first event.