I set 10 honesty traps for Claude Opus 4.8 - and a legal test broke it (www.zdnet.com)

🤖 AI Summary
Anthropic recently launched Claude Opus 4.8, its latest large language model (LLM), claiming improved honesty and judgment over its predecessor, Opus 4.7. A series of ten rigorous tests assessed Claude's ability to handle coding problems, factual misunderstandings, and potential legal traps. Although Opus 4.8 exhibited a marked improvement in self-critique, avoiding fabrications and demonstrating better calibration in its responses, it still faltered in a complex legal scenario where it overstepped by inferring jurisdiction without adequate evidence. This testing highlights the ongoing challenges in achieving true reliability and honesty in AI systems. While Opus 4.8 outperformed its predecessor in many respects, the incident underscores that even the most advanced AIs can succumb to overconfidence and rationalization of errors. The results provoke critical discussions in the AI/ML community about the balance between providing accurate responses and identifying uncertainty, ultimately signaling progress but also a reminder of the limitations still inherent in LLMs.
Loading comments...
loading comments...