The GlitchReplay blog

Field notes on error tracking, session replay, source maps, security signals, and the on-call cost of per-event pricing. Written by people who have been paged at 2 AM.

Latest

Coming soon

Our editorial pipeline. Want one of these sooner? Email hello@glitchreplay.com and we'll bump the queue.

  • The 5 Sentry alternatives in 2026 (and which one you actually need)
    An honest tour of GlitchReplay, Highlight, Bugsnag, Rollbar, and Datadog Error Tracking — what each one is good at and where each one falls down.
  • Sentry vs GlitchReplay vs LogRocket: a real comparison
    Feature, pricing, and SDK compatibility breakdown for the three tools teams most often weigh against each other.
  • Why your Sentry SDK isn't capturing source-mapped stack traces
    The five most common reasons frames render as `<anonymous>` even after you've uploaded source maps — and the diagnostic checklist that finds the cause in under five minutes.
  • The right way to fingerprint errors so you don't drown in duplicates
    Default fingerprints over-group library noise and under-group your real bugs. Here's a fingerprint strategy that mirrors how engineers actually triage.
  • How to capture unhandled promise rejections in modern browsers
    `unhandledrejection` is necessary but not sufficient. The async stacks, polyfill quirks, and framework integrations you also need.
  • Tracking errors in Cloudflare Workers (the right way)
    `waitUntil`, tail workers, and SDK transport limits — the three things that make Workers error tracking different from Node.
  • Session replay vs traditional logging: when each one wins
    Replay is amazing for UI bugs and useless for backend race conditions. A decision matrix for picking the right tool per incident class.
  • How to mask PII in session replays without breaking debugging
    Block-list vs allow-list strategies, the per-attribute exceptions you'll forget about, and the masking tests we run on every release.
  • The 30-second pre-error replay window: why it's the right default
    Why we capture the 30 seconds before the error rather than the entire session — storage cost, signal-to-noise, and what we measured.
  • Replay storage costs: why most vendors price you out
    A back-of-envelope on rrweb payload sizes at scale, the egress traps in cross-cloud setups, and how flat-rate replay is even possible.
  • Debugging a phantom checkout bug with session replay
    A real war story: an intermittent Stripe failure that only happened on iOS Safari with autofill. Replay closed the loop in 20 minutes.
  • Why your stack traces show minified function names (and how to fix it)
    Source map upload is one of three preconditions. Here are all three, and how to verify each in production.
  • Source map upload at build time vs runtime: tradeoffs
    Build-time upload is faster but couples deploys to your error tracker; runtime fetch is slower but more resilient. Picking one.
  • Vite + Sentry SDK + source maps: the missing config
    The three Vite plugin options that everyone forgets, and what production stack traces look like before vs after each one.
  • Hiding source maps from end users while keeping them debuggable
    Server-side upload, signed URLs, and the deploy-script hooks that make this a one-time setup instead of a recurring leak risk.
  • The source-map-loader gotcha that broke our prod debugging
    A subtle webpack misconfiguration that silently shipped wrong source maps for two weeks — and the assertion we now run in CI.
  • Spotting credential stuffing attacks in your error stream
    Auth-error rate, geo dispersion, and user-agent patterns that distinguish a real attack from a buggy mobile app release.
  • Auth spike anomalies: what a real attack looks like in error data
    Annotated time-series from three real incidents we've seen, and the heuristics we use to alert without paging on Black Friday traffic.
  • Why scanner probes show up in your error tracker (and what to do)
    How to classify the `/.env`, `/wp-admin`, and `/.git/config` noise — and why you should keep them, not filter them.
  • XSS attempts in the wild: 5 patterns we see weekly
    The actual payloads landing on production today: query-string injections, CSP-bypass tricks, and the new wave of mutation-XSS.
  • Debugging Cloudflare Workers errors in production
    Tail workers, `console.log` limits, and getting full stack traces out of edge runtime — what works in 2026.
  • Why Cloudflare Workers throw "Script will never generate a response"
    The hanging-promise patterns that trigger it, and the structured-cloning gotcha most people miss.
  • Tracking errors across Cloudflare Pages, Workers, and Durable Objects
    Three runtimes, three transport stories, one trace ID — how to stitch them together so an incident report doesn't need three dashboards.
  • The OpenNext error you'll hit deploying Next.js to Cloudflare
    A walkthrough of the bundle-too-large, dynamic-import, and `nodejs_compat` flag combinations that bite every Cloudflare Next.js team at least once.
  • Cloudflare D1 timeout errors: causes and fixes
    Connection limits, query plan surprises, and the indexes that turn a 30-second query into a 30ms one.
  • Core Web Vitals regressions: catching them before users complain
    RUM-based alerting that fires on real users, not synthetic monitors — including the percentile choice that actually correlates with revenue.
  • A 200ms TTFB regression that cost us $40k/month
    Postmortem of a quietly-rolled-out CDN config change, how RUM caught it, and the alert rule we now run on every deploy.
  • How to alert on Web Vitals without alert fatigue
    Threshold tuning, baseline drift, and the routing rules that send the right alert to the right team — without paging at 3 AM for a country-specific blip.
  • How a 1-line CSS change took down our checkout (and replay caught it)
    Annotated replay frames from a real incident. Five minutes from alert to root cause — without ever reproducing it locally.
  • Postmortem: the React hydration error that survived three deploys
    Why hydration errors are uniquely hard to spot, and the diff between an error tracker that buries them and one that surfaces them.
  • The deploy that tripled our error rate at 2 AM
    What we saw, what we did, and the three guardrails we added so it can't happen the same way again.
  • We migrated 50M events/month off Sentry. Here's what broke.
    The four edge cases the migration script missed, the alerting gap that lasted six hours, and the customer-facing communication that worked.
  • PII scrubbing at ingest: why client-side isn't enough
    Three real ways client-side scrubbing leaks data — and the threat model that makes server-side a hard requirement.
  • HIPAA-friendly error tracking: what the law actually requires
    BAA, masking, audit logs, retention. A non-lawyer's guide to what you have to do, and what your error tracker has to do for you.
  • GDPR and error tracking: the parts your DPO hasn't asked about yet
    URL fragments, request bodies, IP addresses, breadcrumbs — the four places PII enters error data that even careful teams miss.
  • The 7 PII fields your error tracker is leaking right now
    A self-audit checklist with the regexes we use to find leaks in our own ingest pipeline. Run it against your last 1,000 events.
  • Reading your error budget: a guide for engineering managers
    How to convert raw error counts into a number leadership cares about, without hiring an SRE or installing Datadog.
  • How much should you actually spend on observability per developer?
    Benchmarks from 200 teams, broken down by stage and stack. Where the spend goes, and where it's almost always wasted.
  • The "noisy errors" problem and how to triage at scale
    Inbox-zero for error trackers: the five rules that turn a 10,000-issue backlog into a manageable weekly review.
  • Setting up SLOs from your error tracker (without an SRE team)
    A pragmatic SLO definition that uses the data you already have, and the dashboard that makes it visible to the rest of the company.
  • Error tracking for SvelteKit: the complete guide
    `handleError` hooks on both client and server, plus the SSR-vs-client error stream split that matters for real triage.
  • Astro + error tracking: SSR vs island hydration errors
    Why island hydration errors look like client errors but actually originate in the build, and how to source-map them.
  • Remix loaders, actions, and error boundaries: where errors actually surface
    A map of every place a Remix request can fail, the boundary that catches it, and the breadcrumb that explains why.
  • SolidJS error boundaries: catching what useTransition hides
    Suspense boundaries swallow errors by design. Here's how to surface them without sacrificing the UX they enable.