Quantified evidence: Sonnet 4.6 quality regression (github.com)

0 points 8 hours ago ago | visit original

🤖 AI Summary

A user has reported a significant quality regression in the AI model, Sonnet 4.6, starting the week of March 9, 2026. Over 60 days, data collected from 50 sessions reveals an alarming increase in "frustration events," with instances of needing to repeat instructions skyrocketing from a baseline of around 25 events per week to a peak of nearly 500. This operational decline has necessitated a switch to Opus as the primary model, as Sonnet's performance has dropped to levels comparable to Haiku, an outdated model. The regression correlates with a known outage, suggesting potential compute constraints or checkpoint issues. This decline is especially concerning as it reveals the model's failure to perform basic tasks such as recognizing its reasoning opportunities and following explicit guidance consistently. Users are calling for the acknowledgment of this degradation and for developers to clarify potential causes, whether from computational limits or a regression in Reinforcement Learning from Human Feedback (RLHF). Moreover, they advocate for a model versioning mechanism to allow fallback to stable versions, ensuring reliability in AI-assisted projects.

Loading comments...

loading comments...