ExploitGym: Can AI agents turn bugs into exploits?

arxiv.org • 0 points • 0 comments

✨ AI Summary

The introduction of ExploitGym marks a significant advancement in evaluating AI agents' capabilities to exploit security vulnerabilities, a crucial area in cybersecurity. This benchmark consists of 898 real-world vulnerability instances drawn from diverse domains, including userspace programs, Google's V8 JavaScript engine, and the Linux kernel. ExploitGym challenges AI models to transform vulnerabilities into actionable exploits by progressively enhancing initial triggers. The testbed isolates security protections to measure their impact on the agents' performance, providing a comprehensive framework to assess exploitation capabilities.

The findings indicate that while exploitation remains a complex task, leading AI models like Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 can successfully convert vulnerabilities into working exploits for 157 and 120 cases, respectively, even against common defenses. This highlights the dual-use nature of exploitation capabilities—while they can bolster defenses, they also pose an increasing risk as AI technology becomes more proficient. ExploitGym serves as a vital resource for understanding these risks and advancing the discourse around AI’s role in cybersecurity, necessitating proactive measures within the AI/ML community to mitigate potential threats.