Tracking errors across Cloudflare Pages, Workers, and Durable Objects
Three runtimes, three transport stories, one trace ID — how to stitch them together so an incident report doesn't need three dashboards.
A user reports a "Payment Failed" error. You check your Cloudflare Pages logs — nothing. You check the API Worker — it shows a generic 500 with no detail. You finally dig into your Durable Object logs and find a race condition, but you can't prove it's the same request that hit the frontend. Forty-five minutes later you've got three terminal windows and two dashboard tabs open, and you're still guessing. In the distributed world of Cloudflare, a "single" user request is actually a relay race across three runtimes — and if you're not passing the baton (a trace ID) correctly, every incident is a scavenger hunt.
The Fragmentation of the Cloudflare Compute Stack
The reason this is harder than debugging a monolith is structural. A traditional Node.js app handles the whole request in one process with one stack trace. On Cloudflare, that same logical request is split across three independent runtimes, each with its own lifecycle, its own logging, and no shared memory.
Three runtimes, one user request
A typical request path looks like this: the browser loads a page served by Cloudflare Pages, which runs a Pages Function for SSR or middleware; that function calls an API Worker for business logic; and the Worker calls a Durable Object to read or mutate stateful data. Browser to Pages Function to Worker to Durable Object — four hops, four places an error can originate, and four separate log streams.
Why standard logging fails
Each runtime is ephemeral and isolated. A Worker isolate is gone milliseconds after it responds. A Durable Object persists state but its method invocations are still discrete, hard-to-correlate events. There is no shared request ID threading through all of them by default, so when you find an error in the DO, you have no native way to tie it back to the Pages Function call that triggered it. The result is "dashboard fatigue": you know something broke, you just can't assemble the timeline.
Implementing the Unified Trace ID
The foundation of the fix is making every runtime speak the same language. That language is a trace ID, generated once at the edge of the request and propagated through every hop.
Adopting W3C Trace Context
The W3C traceparent header is the standard for this. It carries a trace ID and a span ID in a fixed format, and it's the same convention OpenTelemetry and most modern tracing systems use. The rule is simple: if an incoming request already has a traceparent, reuse its trace ID; if not, generate one. Then forward it on every downstream call.
function getOrCreateTrace(request) {
const incoming = request.headers.get("traceparent");
if (incoming) {
// version-traceId-spanId-flags
const [, traceId] = incoming.split("-");
return { traceId, traceparent: incoming };
}
const traceId = crypto.randomUUID().replace(/-/g, "");
const spanId = crypto.randomUUID().replace(/-/g, "").slice(0, 16);
return { traceId, traceparent: `00-${traceId}-${spanId}-01` };
}Extracting CF-Ray as a secondary fallback
Cloudflare stamps every request with a CF-Ray header — a unique ID for that request as it traversed Cloudflare's network. It's not a substitute for traceparent (it doesn't propagate across your own subrequests the way you need), but capturing it on every error gives you a Cloudflare-native correlation key you can cross-reference against Logpush and the Cloudflare dashboard. Attach both: traceparent for your own correlation, CF-Ray for Cloudflare's.
Error Tracking in Cloudflare Pages (The Frontend)
The chain starts where the user feels the pain, so instrument both the client and the Pages Function layer.
Client-side vs. server-side functions
On the client you initialize the SDK in the browser to catch React errors, unhandled rejections, and to drive session replay. On the server side, Pages Functions run in the same isolate environment as Workers, so you set up capture in _middleware.ts where it can wrap every function invocation and inject the trace ID before anything else runs.
Capturing request.cf metadata
Every request into a Pages Function carries a request.cf object full of edge metadata — country, ASN, and the colo (data center) code. Attaching these to your error context is how you discover that a bug only happens for users routed through a specific region or colo, which is invaluable for diagnosing partial outages:
export async function onRequest(context) {
const { request } = context;
const trace = getOrCreateTrace(request);
setErrorContext({
traceId: trace.traceId,
ray: request.headers.get("CF-Ray"),
colo: request.cf?.colo,
country: request.cf?.country,
asn: request.cf?.asn,
});
return context.next();
}The Middleware Layer: Error Tracking in Cloudflare Workers
The API Worker does the heavy lifting, and it's where the two classic edge-tracking mistakes show up.
Avoiding the zombie worker
A Worker isolate can be frozen the instant it returns a response. If your error report is an in-flight network call when that happens, it dies. The fix, as always at the edge, is ctx.waitUntil — it keeps the isolate alive until your report actually lands. Skip it and you'll lose a meaningful fraction of your error reports to the lifecycle.
Propagating context to the Durable Object
The critical step that ties the whole trace together is forwarding the traceparent on the call to the Durable Object. The DO has no idea which request triggered it unless you tell it, and the only way to tell it is through the headers of the stub.fetch() call:
export default {
async fetch(request, env, ctx) {
const trace = getOrCreateTrace(request);
try {
const id = env.CART.idFromName(userId);
const stub = env.CART.get(id);
// Forward the trace into the Durable Object.
return await stub.fetch("https://do/checkout", {
method: "POST",
headers: { traceparent: trace.traceparent },
body: await request.text(),
});
} catch (e) {
ctx.waitUntil(captureException(e, { traceId: trace.traceId }));
return new Response("Internal Server Error", { status: 500 });
}
},
};The Final Frontier: Debugging Durable Objects
Durable Objects are the hardest piece to monitor, because they blend persistent state with ephemeral, hard-to-observe method invocations.
Persistence of state vs. persistence of errors
A DO keeps its state across requests, but an error that fires during an alarm() callback or a storage transaction has no inbound HTTP request to attach itself to — the alarm fired on a timer, not a user action. These are the errors most likely to vanish entirely. You have to instrument the DO's own methods, including alarm(), and you have to stash the trace ID into the DO's state when the request arrives so an alarm firing later can still report which trace originally scheduled it.
Implementing an error boundary inside the DO
export class Cart {
constructor(state, env) {
this.state = state;
this.env = env;
}
async fetch(request) {
const traceId = request.headers.get("traceparent")?.split("-")[1];
try {
return await this.handle(request);
} catch (e) {
// Report with the trace forwarded from the Worker.
await captureException(e, { traceId, doId: this.state.id.toString() });
return new Response("DO error", { status: 500 });
}
}
async alarm() {
const traceId = await this.state.storage.get("lastTraceId");
try {
await this.runScheduledWork();
} catch (e) {
await captureException(e, { traceId, source: "alarm" });
}
}
}Note that inside a DO you can simply await the capture — DOs are long-lived enough that you don't face the same freeze-on-response race as a stateless Worker, though you should still keep the report lightweight.
Stitching It Together with Session Replay
With a single trace ID flowing from browser to DO, every error report from every runtime carries the same key. That's what lets you reconstruct one incident from one place instead of three.
Connecting the edge to the DOM
Because the front-end session replay is tagged with the same trace ID as the Worker and DO errors, a single click in your dashboard takes you from "Durable Object threw during checkout" to the exact session replay of the user who triggered it. You see the edge error and the user's screen side by side.
Seeing the why
A slow Durable Object doesn't look like an error to the user — it looks like a button that did nothing, a spinner that hung, a double-click out of frustration. The stack trace can't show you that. The replay can, which is why the moments leading up to a stateful timeout are the most valuable thing you can record.
The Performance Impact of Observability
The natural worry is that all this instrumentation taxes the edge. It doesn't have to.
Staying within the limits
Workers run under a 128MB memory ceiling and a 50ms CPU budget on the standard plan; Durable Objects share similar constraints. A lightweight, low-allocation SDK that does no heavy work on the edge — no source-map resolution, no minification, just a compact serialize-and-send — keeps overhead negligible. The processing belongs on the backend, not in the isolate.
Non-blocking reporting
Reporting via ctx.waitUntil means the error report happens after the response is already on its way to the user. The user's latency is unaffected; the telemetry simply rides along in the isolate's extended lifetime.
Cost Strategy: Flat-Rate vs. Per-Event
There's a hidden economic trap in distributed tracing that nobody warns you about.
The volume explosion
When one user request fans out across three runtimes, a single failure can generate three error events — one from Pages, one from the Worker, one from the DO. On per-event pricing, full-stack tracing literally triples your bill for the same incident, which pushes teams to under-instrument exactly the layers they most need to see.
The flat-rate advantage
When the price doesn't scale with event count, you instrument everything without flinching. You capture the Pages error, the Worker error, and the DO error for the same trace, and you assemble the complete picture. For the SDK setup details that make this practical, see our deep dive on error tracking on Cloudflare Workers, and for production debugging strategy more broadly, debugging Workers in production. The pricing math is laid out in flat-rate vs. per-event pricing.
Distributed debugging on Cloudflare doesn't have to mean three terminals and a stopwatch. The whole problem collapses once a single trace ID flows from the browser, through the Pages Function, into the Worker, and down to the Durable Object — and once every error report and session replay carries that same ID. GlitchReplay is built for exactly this: Sentry-compatible capture across all three Cloudflare runtimes, replay linked by trace ID, lightweight enough to stay inside the edge limits, and flat-rate so tracing your whole stack never costs you three times as much. Stop debugging Cloudflare in the dark — give GlitchReplay a try.
GlitchReplay is Sentry-SDK compatible, includes session replay and security signals, and never charges per event. Free to start, five minutes to first event.