Opus 4.6 hallucinates twice as more today than when it released (www.bridgebench.ai)

0 points 3 hours ago ago | visit original

🤖 AI Summary

A recent analysis has revealed that Claude Opus 4.6, an AI model developed by Anthropic, exhibits a significant increase in hallucination rates, fabricating false information during code analysis tasks. The model's hallucination rate has soared to 33% since its launch, making it one of the less reliable models in a comprehensive evaluation involving 30 tasks and 175 questions, where it achieved an overall accuracy of 68.3%. In comparison, Grok 4.20 demonstrated a lower hallucination rate of just 10%, solidifying its position as a leading performer in the field. This uptick in hallucination frequency is crucial for the AI/ML community as it raises concerns over model reliability, particularly in code execution contexts where accuracy is vital. The findings underscore the imperative for developers to focus on reducing fabrication rates when training advanced models, as the repercussions of misinformation can have serious implications for trust in AI-generated output. Consequently, this assessment could incite further research and innovations aimed at improving model fidelity, ensuring AI systems can more effectively serve their intended purposes without misleading users.

Loading comments...

loading comments...