Running an AI-generated failing test locally

Drop the test from a GlitchReplay issue into vitest, jest, playwright, or pytest, watch it run red by design, then unmark it once your fix lands.

GlitchReplay team·April 27, 2026·

testingtutorialworkflow

You opened a runtime error in GlitchReplay, clicked Generate failing test, and got a chunk of code labelled vitest or playwright or pytest. Now what? This guide walks through what that test actually is, where to drop it in your repo, how to run it red, and how to turn it green once you ship the fix. The same pattern applies whether the issue came from a Next.js app, a Cloudflare Worker, a Django backend, or a long-lived Express service — the framework label on the chip tells you which path to follow.

The short version: the generated test is a reproducer. It is intentionally marked as expected-to-fail so it doesn't turn your CI red the moment you commit it. You ship it next to your other tests, watch it stay quietly red while you write the fix, then unmark the failure marker — and that single line change is your verification that the bug is gone.

What you're actually getting

The test body is generated from three pieces of data: the issue's most-recent stack frames, the trailing breadcrumbs (clicks, navigations, fetches before the throw), and the captured exception type and message. The model picks the framework from the platform tag on the project — a Next.js or Vite project gets vitest, a Node service gets jest, a Python stack gets pytest, browser end-to-end traces get playwright. If breadcrumbs include a Cypress URL it will pick that. The output is one self-contained file with imports, a single failing case, and a leading comment that pins the GlitchReplay issue id so reviewers can jump back to the source.

It is not pretending to be a perfect unit test. The confidence chip you see on the result tells you how sure the model was that the stack contained enough signal to write a tight reproducer. Anything below 60% is a starting point, not a finished test — read the next section about minified stacks for what to do then.

Where to put it in your repo

Pick a path that matches your project's convention for new tests, but include the GlitchReplay id in the filename so the test is searchable later. A few patterns we see work well:

tests/regression/gr_uuj73ne.test.ts — a dedicated "regression" folder collects reproducer tests separately from your unit tests, which keeps them out of the way during normal development but easy to find when you're cleaning up backlog.
src/checkout/__tests__/gr_uuj73ne.spec.ts — colocated with the module the bug lives in. Best when the issue is firmly scoped to one area.
e2e/gr_uuj73ne.spec.ts — Playwright tests almost always live in a top-level e2e/ folder, separate from unit tests, because they need a running server.

Whichever you pick, the issue id in the filename matters. When the test goes green and you remove the xfail / it.fails marker, that filename is what shows up in git log, and it ties the verification commit to the original error.

Vitest: the most common path

If you saw the vitest chip, the test uses describe.fails or it.fails. Vitest treats those as the inverse of normal: a passing assertion inside an it.fails turns the test red, a thrown error keeps it green. So as long as your bug is still present, your CI is happy.

// gr_uuj73ne — paste this at tests/regression/gr_uuj73ne.test.ts
import { describe, it, expect } from "vitest";
import { render } from "@testing-library/react";
import { EmbedCodeGeneratorContent } from "../src/EmbedCodeGenerator";

describe.fails("gr_uuj73ne — invalid dangerouslySetInnerHTML", () => {
  it("throws React error #418 when value is null", () => {
    expect(() => render(<EmbedCodeGeneratorContent />)).toThrow(
      /Minified React error #418/,
    );
  });
});

Run it once to confirm it's red the way you expect:

pnpm vitest run tests/regression/gr_uuj73ne.test.ts

You should see a green tick next to the test name, with a small annotation like (expected fail) or a yellow chevron — exact wording depends on your reporter. If the test fails for the "wrong reason" (e.g. a missing import, a typo the model produced, an internal-only module the test can't see), you'll know immediately because Vitest prints the actual stack instead of the silent "ok". Fix those plumbing issues before you start on the real bug.

Jest: same idea, slightly different shape

Jest doesn't have an it.fails primitive. The model emits describe.skip with a // TODO: unskip when fixed comment, or wraps the assertion in a try/catch that asserts on the error itself. The pattern that's easiest to reason about is the second one:

// gr_uuj73ne — paste this at __tests__/gr_uuj73ne.test.ts
import { triggerCheckoutFlow } from "../src/checkout";

it("gr_uuj73ne — checkout flow throws on missing tax id", async () => {
  await expect(triggerCheckoutFlow({ taxId: undefined }))
    .rejects.toThrow(/tax_id is required/);
});

This is a normal Jest assertion: it will pass as long as the error happens. The day your fix lands and the function stops throwing, the test starts failing — and that's your signal that something either fixed the bug or changed the contract. Either way you read the diff, decide which it is, and update the test.

Playwright: when the breadcrumbs say "browser"

If the issue's breadcrumbs included page navigations and the platform is browser-based, you'll get a Playwright spec. Playwright has test.fail as a first-class marker:

// gr_uuj73ne — paste this at e2e/gr_uuj73ne.spec.ts
import { test, expect } from "@playwright/test";

test.fail("gr_uuj73ne — checkout submits with a 500", async ({ page }) => {
  await page.goto("/checkout");
  await page.getByRole("button", { name: /pay now/i }).click();
  await expect(page.getByText(/payment confirmed/i)).toBeVisible();
});

The test.fail wrapper says "this test is expected to fail right now." When the assertion fails (it does, because the page actually shows an error), Playwright reports it as a pass. Once you ship the fix and the assertion starts passing, Playwright reports it as a fail with the message "Expected to fail but passed." That's your nudge to remove the .fail marker.

One catch: Playwright tests need a running server. If your project doesn't already have a webServer entry in playwright.config.ts, add one or run pnpm dev in a side terminal before running the test.

Pytest: xfail does the heavy lifting

For Python services, the model uses @pytest.mark.xfail:

# gr_uuj73ne — paste this at tests/regression/test_gr_uuj73ne.py
import pytest
from app.checkout import calculate_tax

@pytest.mark.xfail(reason="gr_uuj73ne — known regression, fix in progress")
def test_calculate_tax_handles_zero_rate():
    assert calculate_tax(amount=100, rate=0) == 0

Pytest will report the test as XFAIL — yellow, not red — for as long as the bug exists. When the fix lands and the assertion starts passing, you get XPASS, which most CI configs treat as a soft failure that nudges you to delete the marker.

The handoff: red, fix, green

Now that the test is in your repo and red-by-design, the workflow is the boring part:

Open the file the test points at and read the stack frames in the GlitchReplay issue alongside it. The AI Analysis section on the same page will often name the file and the suspected root cause — start there.
Write the fix. Run the regression test in watch mode (vitest --watch / jest --watch / pytest --looponfail). It stays red until your fix flips it.
When the test starts "passing" (or in the case of it.fails, starts actually failing, which means the bug is gone), remove the fails / xfail / test.fail marker.
Commit both changes — the fix and the unmark — in the same commit so a future bisect can land on a single SHA.
Mark the issue as resolved in GlitchReplay. If you've got the Resolution commit URL field wired up, paste the commit SHA there so the next person who looks at this issue can jump straight to the patch.

What if the test was generated from a minified stack?

Confidence under 50% almost always means the source maps weren't available when the event was captured, so the stack is full of aF / iX / oW instead of real function names. The test will still be syntactically valid, but it will likely call something the model invented based on the breadcrumbs rather than your real API. Two options:

Upload source maps for the affected release and click Regenerate on the issue. The next pass will use deminified frames and the test will reference real symbols. This is the right answer almost always — once your maps are flowing, every future error and every future test gets it for free.
Treat the generated test as a skeleton. Keep the imports, keep the describe.fails wrapper, and rewrite the body against the real function names. The model is bad at guessing module paths from minified output, but it's usually right about the shape of the failure ("X is null when Y is unset"), and that's the part that's hardest to figure out cold.

Why this workflow is worth the friction

Writing a regression test for every production bug is the kind of advice that sounds great in a blog post and never happens in practice — most bugs are fixed in five minutes by someone who doesn't want to spend another fifteen reproducing the path that caused it. The cost of writing the test is the limiting factor.

The reproducer test that ships next to the fix changes that math. You aren't writing the test — you're reviewing one that already exists, dropping it in the right folder, and watching it flip from red to green when your patch lands. That's a five-minute add to your fix-the-bug loop instead of a fifteen-minute one. Multiply by every error the team triages this quarter and the regression suite grows by itself.

And the next time the same code path breaks — different parameter, different release, different cause — the existing test catches it before deploy.

Stop watching your error bill spike.

GlitchReplay is Sentry-SDK compatible, includes session replay and security signals, and never charges per event. Free to start, five minutes to first event.

Get started — free Read the docs