Auth spike anomalies: what a real attack looks like in error data
Annotated time-series from three real incidents we've seen, and the heuristics we use to alert without paging on Black Friday traffic.
It's 3:00 AM on Black Friday. Your "Auth Failures" alert fires: 15,000 401 Unauthorized errors in the last 60 seconds. On a per-event pricing plan, you've just spent a couple hundred dollars on error data before you've even connected to the VPN. But is this a botnet draining your user table, or a thousand tired shoppers fat-fingering their passwords for a limited-edition drop? Most auth alerts are the second thing. Real attacks are the first—and they leave fingerprints in your metadata that a raw count alert will never see.
This post is about reading those fingerprints. We'll walk three annotated incident shapes—a lazy brute force, a distributed stuffing campaign, and a broken deploy masquerading as an attack—then build heuristics that alert on the real ones without paging you every time traffic surges.
The Geometry of Error Data: Noise vs. Signal
Raw volume is a dumb metric. "15,000 errors" means nothing without a denominator. The number you actually care about is error density—failures as a fraction of total auth attempts—and the shape of the spike over time. High-traffic events make simple thresholds useless precisely because volume rises for entirely benign reasons.
The flash-mob effect
A marketing push or a viral moment produces a bell curve: failures climb as traffic climbs, peak, and decline, while the ratio of failures to attempts stays roughly constant. More people means more typos means more 401s. That's normal. A threshold set on raw count will scream through every successful campaign you ever run.
The hammer vs. the scalpel
Attackers vary cadence deliberately. The hammer is high-frequency and obvious. The scalpel is low-and-slow, tuned to stay under per-IP rate limits. The error shape distinguishes them: a botnet initiation is a vertical wall (zero to thousands in one step), while organic traffic ramps. When you see a step function instead of a curve, pay attention.
Case Study 1: The "Lazy" Brute Force
High frequency, low entropy. One IP, one user-agent, thousands of attempts against a handful of accounts. This is the most common and the easiest to catch—if you're looking at the right dimensions.
The vertical fingerprint
The tell is the collapse of cardinality. Thousands of failures, but count_distinct(ip) is 1 (or a tiny handful), and the user-agent is a single static string. Real traffic spreads across many IPs and many UAs; this concentrates into one source hammering relentlessly.
Why standard 5xx monitoring misses it
From the load balancer's perspective, a rejected login is a successful 4xx response—the system worked, it correctly denied a bad credential. So 5xx-based health monitoring shows green while you're under attack. You have to watch the 4xx auth stream specifically, which is exactly the gap your error tracker fills (see spotting credential stuffing in your error stream).
// Lazy brute force: one source, one UA, many tries
{
"message": "401 Unauthorized",
"tags": {
"ip": "203.0.113.5", // the SAME ip, over and over
"user_agent": "python-requests/2.31.0",
"attempted_user": "admin@example.com"
}
}
// 4,000 of these in 60s from a single ip.Case Study 2: Distributed Credential Stuffing
The hard one. Ten thousand IPs, each trying one password every ten minutes. Per-IP rate limiting is useless here, because no single IP exceeds any reasonable threshold. The attack is invisible at the IP level and only appears in aggregate.
Entropy analysis
You catch this by inverting the lens. Per-IP, everything looks fine—attempts-per-IP hovers near 1. But the total failure count is enormous, and the IP set is freshly seen and geographically scattered. High IP cardinality combined with near-1 attempts-per-IP and a low-entropy user-agent distribution is the distributed-stuffing signature. The fleet shares tooling even when it spreads source addresses.
Correlating unique-userID misses across the fleet
Another angle: the campaign walks a credential list, so you see a long tail of distinct attempted usernames, each failing once or twice across different IPs. That "many users, each barely tried, all failing" pattern doesn't occur naturally—real users retry their own account, not one stranger's account apiece.
Per-IP view (looks harmless): Aggregate view (alarming):
ip A -> 1 attempt distinct IPs: ~10,000
ip B -> 1 attempt total 401s: ~18,000
ip C -> 2 attempts attempts per IP: ~1.8
... distinct usernames: ~9,500Case Study 3: The "Broken Deploy"
The false positive that wakes the wrong team. A misconfigured OIDC provider or a client-side SDK bug can perfectly mimic a brute-force wall.
The same-version correlation
The giveaway is the release tag. If 100% of the auth failures carry one release, an external attacker can't be the cause—they don't know or care about your version, and real attacks span every client version in the wild. A spike that's monochromatic by release is your own bug, and the timestamp usually lines up suspiciously well with a recent git commit.
Checking the Referer for internal looping
A refresh-token interceptor stuck in a retry loop will show a Referer pointing back into your own app and a tight, machine-regular cadence per user. That's internal sabotage by way of a bad merge, not an outside fleet. Distinguishing this fast is the same discipline we apply in diagnosing a bad-deploy error spike.
Advanced Heuristics: Alerting Without the Fatigue
The goal is alerts that fire on attacks and stay silent on Black Friday. Three rules get you most of the way.
Relative error rates
Alert on failures as a percentage of total auth attempts, not a raw count. A 5% failure rate is suspicious whether you're doing 100 logins a minute or 100,000. The ratio holds steady through legitimate surges and breaks during attacks, which is exactly the property you want.
High-cardinality grouping
Alert on sudden jumps in count_distinct(ip_address) hitting your auth endpoint, and on collapses in user-agent entropy. A 10x jump in distinct source IPs against /login within a minute is a far better attack signal than total request volume, and it catches the distributed case that rate limits miss.
The Black Friday buffer
Compare against a 7-day seasonal baseline rather than a flat number. "Auth failures are 8x the same hour last week" survives predictable peaks, because last week's same hour already encodes your normal rhythm. A static threshold can't tell a sale from a siege; a seasonal one can.
Using Session Replay as the Smoking Gun
Metadata builds the case; replay closes it. When the heuristics flag a suspect session, watching the actual interaction confirms intent in seconds.
Spotting machine-like interaction
A human fills a login form with mouse movement, tab presses, and human-speed typing. A bot fills a 12-character password field in 2 milliseconds with no pointer movement and no focus events. Seeing that in replay is unambiguous—no statistical argument required. (Capture it without recording the credential itself: mask PII in session replay.)
Identifying the target
Replay also tells you where they're hitting: the rendered login form, or the API directly with no UI involved. Direct-to-API with no DOM interaction means a script that skipped your frontend entirely, which informs whether a Turnstile challenge on the form will even help.
From Signal to Action: The Remediation Loop
Confirmed anomalies should feed back into your edge. Push the offending IP ranges and user-agent patterns into WAF rules, and trigger "challenge mode" (CAPTCHA or Turnstile) only during confirmed spikes, so normal users never see friction. The security headers checker and security documentation cover hardening the rest of the path.
And keep flat-rate pricing in mind as a security requirement, not a billing preference. If observing an attack threatens to blow your event budget, you'll be tempted to drop the very data that defines the attack. A million-event spike that costs the same as a quiet Sunday means you can watch the whole incident unfold at full fidelity. Stop paying attack taxes on your error data—GlitchReplay is Sentry-compatible, flat-rate, and built so a 3 AM botnet costs exactly what an idle weekend does.
GlitchReplay is Sentry-SDK compatible, includes session replay and security signals, and never charges per event. Free to start, five minutes to first event.