Claude Code Degradation: A postmortem of three recent issues (www.anthropic.com)

🤖 AI Summary
Anthropic traced a wave of intermittent Claude quality regressions between early August and mid-September to three separate infrastructure bugs (not intentional quality throttling). Initial user reports grew from subtle to widespread after a routine load‑balancer change on Aug 29, prompting an investigation that found: (1) a context‑window routing bug (Aug 5) that misdirected short‑context requests to servers configured for a 1M‑token context window, producing sticky misrouting and affecting up to 16% of Sonnet 4 traffic at peak; (2) an output‑corruption misconfiguration (Aug 25) on TPU runtime optimizations that sometimes assigned high probability to unlikely tokens, producing stray foreign characters or syntax errors; and (3) a latent XLA:TPU compiler miscompilation triggered by an “approximate top‑k” optimization (Aug 25–26) that could return completely wrong token candidates for certain batches and configs, notably impacting Haiku 3.5 and possibly other models. Fixes included routing logic corrections (rolled out by Sept 4–16), rolling back the misconfigured TPU change (Sept 2), switching from approximate to exact top‑k with standardized fp32 ops, and working with the XLA team on a compiler fix. The incident highlights fragile interactions between mixed‑precision (bf16 vs fp32), distributed top‑p sampling across chips, and platform heterogeneity (AWS Trainium, NVIDIA GPUs, Google TPUs, Bedrock, Vertex). Diagnosis was slowed by cross‑platform variability, privacy constraints limiting access to problematic queries, and noisy canary evaluations. Anthropic is strengthening continuous in‑production evaluations, adding detection tests (e.g., for unexpected characters), improving debugging tooling, and urging users to report bugs (/bug or thumbs‑down) to accelerate future detection and remediation.
Loading comments...
loading comments...