Debugging Cloudflare Workers errors in production
Tail workers, `console.log` limits, and getting full stack traces out of edge runtime — what works in 2026.
It's 2:00 AM, and your Cloudflare Worker is throwing a Subrequest limit exceeded for about 1% of users. You run wrangler tail, but the traffic volume is high enough that the specific error you're hunting gets sampled out or buried under a wall of 200 OKs. You sit there refreshing, hoping the next failure happens while you're watching. This is the moment most developers realize that debugging at the edge is not the same job as debugging a server — and that "praying the logs catch it" is not an observability strategy.
The Edge Debugging Gap: Why Traditional Methods Fail
On a traditional server you have a persistent process, a filesystem, and a log file you can grep at your leisure. A Cloudflare Worker has none of those. It runs in a V8 isolate that spins up in under five milliseconds, handles your request, and is frozen or discarded almost immediately. There is no log file. There is no process to attach a debugger to. The execution environment that produced your error may not exist by the time you go looking for it.
The ephemeral nature of isolate logging
When your Worker calls console.log, that line doesn't go to disk. It's buffered in memory and streamed out-of-band to whatever is listening — and if nothing is listening, it's gone. This is why a bug that "definitely logged something" leaves no trace: the isolate emitted the log into the void because no tail session or Logpush job was connected at that instant.
Why console.log is not a production strategy
Beyond the ephemerality problem, logging has a real cost at the edge. Every console.log serializes its arguments, and serialization burns CPU time — the scarce resource in a Worker capped at 50ms of CPU per request on the standard plan. A few stray logs in a hot path can push a request over the wall, and now your logging is causing the failures it was meant to diagnose. Logging is fine for development. In production it is a liability you want minimized and replaced with structured, sampled telemetry.
Moving Beyond Wrangler Tail
wrangler tail is an excellent development tool and a poor production one. The reasons are baked into how it works.
The limitations of wrangler tail
Tail connects a WebSocket to a sampling pipeline. On a high-traffic Worker, Cloudflare samples the stream to protect the runtime, which means the one-in-a-thousand error you care about may simply never be delivered to your terminal. The connection also drops, times out, and can't be queried after the fact — if the error happened thirty seconds before you ran the command, you missed it. Tail shows you a live window, not a history.
Introducing Tail Workers: programmable observability
The production-grade answer is a Tail Worker: a second Worker that Cloudflare invokes automatically after your main Worker finishes, handing it the execution's logs, exceptions, and outcome. This happens out-of-band, so it adds zero latency to the request your user is waiting on. You wire it up in wrangler.toml with a single line:
# In the producer Worker's wrangler.toml
name = "my-api"
main = "src/index.ts"
# Point at the Worker that will receive execution traces
tail_consumers = [
{ service = "my-observability-tail" }
]Using Logpush for high-volume debugging
For enterprise volumes where you want every line shipped to durable storage, Logpush batches Worker logs and delivers them to R2, S3, or a logging vendor. Logpush is the right tool when you need retention and ad-hoc querying across billions of requests. For most teams, a Tail Worker that forwards exceptions to an error tracker covers the "something broke, show me what" case far more cheaply.
Deep Dive: Implementing Tail Workers for Custom Error Piping
The Tail Worker receives an array of trace events, one per producer invocation. Each event carries the outcome, any uncaught exceptions, the emitted logs, and metadata like the script name. Your job is to filter out the noise and forward what matters.
Capturing exceptions, logs, and outcome
The most important field is outcome. You almost never want to forward the ok executions — that's the 99% of traffic that succeeded. You want exception, and selectively exceededCpu or canceled. Filtering at this layer is what keeps your event volume — and your bill — sane.
Filtering noise: only sending real failures to your tracker
export default {
async tail(events) {
const failures = events.filter(
(e) => e.outcome === "exception" || e.outcome === "exceededCpu"
);
if (failures.length === 0) return;
await fetch("https://glitchreplay.com/api/0/envelope/", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(
failures.map((e) => ({
script: e.scriptName,
outcome: e.outcome,
exceptions: e.exceptions, // name, message, stack
logs: e.logs.slice(-10), // last few breadcrumbs only
ts: e.eventTimestamp,
}))
),
});
},
};Note the logs.slice(-10). A Tail Worker payload has size limits, and you don't need the entire log history — the last handful of lines before the crash is what tells the story. Trimming here keeps you well under the per-invocation payload ceiling and keeps the Tail Worker itself fast.
Getting Full Stack Traces: The Source Map Challenge
Here's where most teams hit a wall. Your Worker is bundled and minified by esbuild before it ever reaches Cloudflare, so the stack trace you capture points at worker.js:1:24503. That number is useless for debugging — it's a column offset into a single-line megabyte of generated code.
The "anonymous:1:1234" problem
The relationship between that offset and your actual source — the function name, the file, the line you wrote — lives entirely in the source map produced at build time. Without it, every Worker error is a riddle. With it, the same error reads TypeError at handlePayment (src/payments.ts:42).
Automating source map uploads during wrangler deploy
The fix is to upload your source maps to your error tracker as part of the deploy, then delete them before they reach any public surface. Enable map generation in wrangler.toml and add an upload step to your deploy script:
# wrangler.toml
upload_source_maps = trueThe tracker then matches the bundle hash in an incoming error to the uploaded map and deminifies on ingest. The critical detail is that the filename in the captured stack must match the uploaded map exactly — V8 isolates are particular about the worker.js naming convention. Our source maps documentation covers the upload mechanics, and the deeper trade-offs around keeping maps private live in our post on hiding source maps in production.
The waitUntil Trap
Errors that happen after you've returned the response are the ones that vanish completely. The moment your fetch handler returns, Cloudflare is free to freeze the isolate. Any in-flight work — including the network call your error SDK is making to report a problem — gets killed mid-flight.
Why standard try/catch misses orphan errors
A try/catch only protects synchronous and awaited code inside its block. If you kick off a background task and don't await it, or if your SDK's captureException returns a promise you ignore, that work races against the isolate's lifecycle and usually loses. The error is real, the SDK tried to send it, and it died on the wire.
Wrapping asynchronous tasks to ensure capture
The fix is ctx.waitUntil, which tells the runtime to keep the isolate alive until the promise you hand it settles:
export default {
async fetch(request, env, ctx) {
try {
return await handle(request, env);
} catch (e) {
// Keep the isolate alive until the report is actually sent.
ctx.waitUntil(captureException(e, { request }));
return new Response("Internal Server Error", { status: 500 });
}
},
};This single line is the difference between a Worker that reports its failures and one that silently drops a third of them.
Real-Time Debugging with Session Replay at the Edge
A deminified stack trace tells you what threw. It rarely tells you why. The why usually lives in the browser: what the user clicked, what the previous API call returned, what state the page was in. Connecting the edge error to the front-end session closes that loop.
Reconstructing the user journey leading to an edge error
When a Worker 500 is linked by trace ID to a session replay, you can watch the seconds leading up to the failure — the form submission with the malformed field, the double-click that fired two mutating requests, the expired token that should have triggered a refresh. The edge error and the user action stop being separate investigations.
Handling PII at the edge
Scrub sensitive data before it leaves the Cloudflare network using a beforeSend hook in your capture path. Cleaning payloads at the edge means tokens and personal data never hit the open internet, which keeps your compliance story simple.
The Cost of Observability: Flat-Rate vs. Per-Event
Here's the part nobody mentions in the getting-started guides. Cloudflare Workers are cheap precisely because they run at enormous volume. A 0.1% error rate — genuinely good — on a Worker serving 100 million requests a month is still 100,000 error events. On a per-event pricing model, that turns observability into a tax that scales with your success.
Calculating the error tax
The perverse outcome is that teams start sampling out errors to control cost, which means the one error that mattered is the one they didn't capture. You end up paying for visibility and getting blind spots anyway. We dug into the math in flat-rate vs. per-event pricing.
Why flat-rate pricing enables always-on debugging
When the price doesn't move with volume, you stop rationing. You capture every exception, every CPU-limit termination, every orphaned-promise failure, because there's no marginal cost to doing so. That's the posture you want at the edge, where the failures are rare, distributed, and impossible to reproduce locally.
The 2026 Edge Observability Checklist
- Wire up a Tail Worker via
tail_consumersand filter toexceptionandexceededCpuoutcomes only. - Automate source map uploads in your deploy and confirm the filename matches the captured stack exactly.
- Use
ctx.waitUntilfor every error report and background task so nothing dies when the response returns. - Scrub PII at the edge with a
beforeSendhook before telemetry leaves Cloudflare. - Choose flat-rate tracking so you never have to sample away the error that matters.
Debugging at the edge is a different discipline, but it's a learnable one. Stop treating Workers like tiny servers and start treating them like the ephemeral isolates they are, and your blind spots disappear. GlitchReplay gives you Sentry-compatible error tracking and session replay built on the same Cloudflare infrastructure you're monitoring — automatic source-map resolution, a drop-in Tail Worker pattern, and flat-rate pricing that never punishes you for traffic. If you're tired of refreshing wrangler tail at 2 AM, give GlitchReplay a try.
GlitchReplay is Sentry-SDK compatible, includes session replay and security signals, and never charges per event. Free to start, five minutes to first event.