Core Web Vitals regressions: catching them before users complain
RUM-based alerting that fires on real users, not synthetic monitors — including the percentile choice that actually correlates with revenue.
The deploy was green. Every unit test passed, the Lighthouse check in CI scored 98, and the PR merged clean. Two weeks later someone notices mobile conversion is down 12% and nobody can explain why. The culprit, eventually: a "hero image optimization" that quietly pushed the LCP element behind a heavy JavaScript task. Invisible to a high-powered build server. Devastating for a real user on a three-year-old Android phone over 4G. This is the gap between lab data and field data, and it's why your synthetic monitoring keeps telling you everything is fine while your revenue says otherwise.
The Synthetic Blind Spot: Why Lab Data Is Lying to You
Synthetic monitoring — Lighthouse, Puppeteer scripts, your CI performance gate — runs your page in a controlled, repeatable environment. That control is exactly the problem. It's a perfect-world simulation, and your users do not live in a perfect world.
The variability of the long tail
Real users span a vast range of devices and conditions: flagship phones and budget Androids three CPU generations behind, fiber connections and congested 4G, fresh page loads and tabs restored from memory pressure. A lab test pins all of those variables to one comfortable configuration. Your p75 user — the 75th-percentile experience — is often running on hardware several times slower than your build server, and that difference is precisely where regressions hide. "It works on my machine" is the performance equivalent of "it compiles."
Why Lighthouse in CI doesn't reflect CrUX
Google doesn't rank your site on your Lighthouse score. It ranks on the Chrome User Experience Report (CrUX) — aggregated field data from real Chrome users. A local M3 Mac might report an LCP of 0.8s for a URL whose real-world p75 LCP is 3.2s. Those are measurements of the same page, and only one of them affects your search ranking and your conversion rate. If your alerting is built on the 0.8s number, you are flying blind to the 3.2s reality.
The Vital Three (Plus One): What to Alert On in 2026
Core Web Vitals have stabilized around three metrics, with a clear "good" threshold for each at p75.
LCP: the visual load threshold
Largest Contentful Paint measures when the biggest above-the-fold element renders — usually a hero image, heading, or video poster. Good is under 2.5 seconds. LCP regressions are sneaky because they're often caused not by the element itself getting slower but by something else getting in front of it: a render-blocking script, a font swap, a lazy-load applied to the wrong element.
CLS: the stability threshold
Cumulative Layout Shift measures unexpected movement — content jumping as images load without dimensions, ads injecting, or a cookie banner shoving the page down. Good is under 0.1. CLS is the hardest vital to catch in development because your dev environment loads from warm cache with everything already sized; the shift only happens on a cold load over a slow network, which is to say, only for real users.
INP: the responsiveness threshold
Interaction to Next Paint replaced FID as the responsiveness metric. Where FID only measured the delay before the first interaction was processed, INP measures the latency of interactions across the whole session and reports a high percentile. Good is under 200 milliseconds. INP catches the sluggish dropdown, the laggy form field, the button that takes half a second to respond because a long task is blocking the main thread. You wire all three up with the web-vitals library and forward to a Sentry-compatible callback:
import { onLCP, onCLS, onINP } from "web-vitals/attribution";
function report(metric) {
// Send the value AND the attribution data that explains it.
captureMetric({
name: metric.name,
value: metric.value,
rating: metric.rating, // "good" | "needs-improvement" | "poor"
element: metric.attribution?.element, // the selector that caused it
url: location.pathname,
});
}
onLCP(report);
onCLS(report);
onINP(report);The Percentile Paradox: Why p75 Is the Floor, Not the Ceiling
Once you're measuring real users, the next question is which user. The answer is a percentile, and the choice of percentile is a business decision disguised as a statistics one.
The SEO pass vs. the user-experience fail
Google evaluates CrUX at p75. So if your p75 LCP is 2.4s, you "pass" for ranking purposes. But p75 means a full quarter of your users are having a worse experience than that — and if your p90 LCP is 5 seconds, those users are bouncing. You can hold a green Search Console badge while a quarter of your traffic suffers. p75 is the SEO floor you must clear, not the ceiling you should aspire to.
When to use p90 or p95
For high-value flows — checkout, signup, the steps that directly produce revenue — you should be alerting at p90 or p95. The math behind why is the "100ms rule" that's been cited for years: even small increases in latency measurably depress conversion, and a one-second LCP delay on mobile is frequently associated with bounce-rate increases around 20%. The users in your p90 tail are the ones whose abandoned carts don't show up in your p75 dashboard. You can borrow the "budget" framing from error tracking here; our error budget tool applies the same idea to reliability.
Setting Up Noise-Free Alert Rules
The fastest way to make a performance alert useless is to make it noisy. An alert that fires constantly gets muted, and a muted alert catches nothing.
Static thresholds vs. relative regressions
There are two complementary styles. A static threshold ("alert when p75 LCP > 2.5s") catches absolute badness. A relative regression ("alert when p75 LCP increases more than 20% week-over-week") catches the deploy that made things worse even if you're still technically in the green. Relative alerts are what catch that 12%-conversion-drop deploy — the one that went from 1.8s to 2.4s, never crossed the static line, and would have been invisible without a baseline comparison.
Filtering by page group and sample size
{
"metric": "LCP",
"percentile": "p75",
"condition": "increase > 20% vs 7d baseline",
"page_group": "/checkout/*", // alert where it matters
"exclude": ["/terms", "/privacy"],
"min_sample_size": 500, // don't fire on 3 data points
"window": "1h"
}The two settings that save your sanity: scope alerts to page groups so you're paged about checkout, not the terms-of-service page, and set a minimum sample size so a single user on a terrible connection can't trigger a false alarm. Low-volume noise is the number-one cause of alert fatigue in RUM.
Root Cause Analysis: Connecting Vitals to Session Replay
Knowing your LCP regressed is useful. Knowing why is the part that actually lets you fix it, and that's where a number alone falls short.
Identifying the LCP element
The web-vitals attribution build hands you the actual DOM selector of the element that was the LCP, plus the timing breakdown — how much was time-to-first-byte, how much was render delay, how much was resource load. That tells you whether the regression was a slow server, a render-blocking resource, or the element simply changing. Our web vitals tool helps you inspect these locally before you ship.
Visualizing CLS and INP
For CLS, a session replay is transformative: instead of a 0.18 score, you watch the cookie banner shove the article down 200 pixels as the user is about to tap a link. For INP, the breadcrumb timeline shows the long task that blocked the main thread right when the user clicked. You stop theorizing about which div shifted or which handler was slow, because you can see it happen frame by frame.
The Performance Budget as Team Culture
Alerts catch regressions after they ship. A performance budget stops them from accumulating in the first place.
Budgets in the on-call rotation and in PRs
Treat a performance regression like a reliability incident: it goes to whoever's on call, and it gets investigated, not silently tolerated. Even better, surface RUM-based budgets in the pull request itself, so the author of the hero-image change sees "this raises p75 LCP on /checkout by 18%" before merging, not two weeks later in a conversion report. Here's a budget table worth pinning somewhere visible:
- LCP — Good: under 2.5s. Needs improvement: 2.5–4.0s. Poor: over 4.0s.
- CLS — Good: under 0.1. Needs improvement: 0.1–0.25. Poor: over 0.25.
- INP — Good: under 200ms. Needs improvement: 200–500ms. Poor: over 500ms.
If you're moving your monitoring stack, you can keep this whole setup when you switch tools — our guide on migrating from Sentry to GlitchReplay covers doing it without changing your instrumentation code.
The lesson of every quietly-shipped performance regression is the same: synthetic tests measure a world your users don't live in. Move your alerting to real-user data, alert on relative regressions and not just static thresholds, watch the right percentile for the flow that earns the money, and connect the metric to a replay so you can see the cookie banner or the long task with your own eyes. GlitchReplay gives you RUM-based Core Web Vitals alerting and session replay through a single Sentry-compatible SDK, at a flat rate that doesn't punish you for having the high traffic that makes field data meaningful in the first place. Stop guessing why your site feels slow — give GlitchReplay a try.
GlitchReplay is Sentry-SDK compatible, includes session replay and security signals, and never charges per event. Free to start, five minutes to first event.