Our evaluation of OpenAI's GPT-5.5 cyber capabilities (www.aisi.gov.uk)

🤖 AI Summary
An evaluation of OpenAI's GPT-5.5 has revealed significant advancements in its cyber capabilities, paralleling a prior assessment of Anthropic's Claude Mythos Preview. Both models successfully navigated a complex corporate network attack simulation, with GPT-5.5 completing the simulation known as "The Last Ones" (TLO) in 2 out of 10 attempts. This result underscores a possibly broader trend in AI development, where models are increasingly demonstrating advanced cybersecurity skills through greater autonomy, reasoning, and coding capabilities. In comprehensive tests involving 95 cyber tasks, GPT-5.5 achieved a notable 71.4% pass rate on advanced-level challenges, outperforming its predecessor, GPT-5.4. These findings are critical as they highlight the rapid advancements in AI’s ability to handle cybersecurity threats, coinciding with a rise in cyber attacks against businesses—43% of UK organizations reported breaches in the past year. With the UK government taking significant action to bolster cyber resilience, including funding and legislative measures, the increasing accessibility of sophisticated AI models like GPT-5.5 presents both challenges and opportunities. Defenders can leverage similar AI capabilities in their systems, emphasizing the importance of proactive measures to counter evolving cyber threats.
Loading comments...
loading comments...