Our evaluation of Claude Mythos Preview’s cyber capabilities (www.aisi.gov.uk)

0 points 74 days ago ago | visit original

🤖 AI Summary

The AI Security Institute (AISI) has evaluated Anthropic's Claude Mythos Preview, revealed on April 7, to assess its cybersecurity capabilities, marking significant advancements in AI's ability to conduct multi-stage cyber attacks. The assessments indicate that Mythos Preview is a leap forward over previous models, successfully completing complex tasks that required human experts days to achieve. It scored a remarkable 73% success rate on expert-level capture-the-flag (CTF) challenges, a feat no other models managed before April 2025. In a groundbreaking demonstration, it autonomously solved a 32-step corporate network attack simulation—showing proficiency in 3 out of 10 attempts, while other models lagged behind. These evaluations underscore the urgency for enhanced cybersecurity measures, as Mythos Preview can exploit vulnerable systems with weak defenses. The controlled environments of the tests, devoid of active defenses, signal that as AI capabilities increase, so too will the challenges in real-world cybersecurity. The AISI cautions organizations to prioritize fundamental security practices, like consistent software updates and robust access controls, to mitigate risks from increasingly capable AI models. Future evaluations will shift focus to assessing models in defended environments to better understand their potential impacts on cybersecurity.

Loading comments...

loading comments...