Comparing AI Agents to Cybersecurity Professionals in Real-World Pen Testing (arxiv.org)

0 points 208 days ago ago | visit original

🤖 AI Summary

A recent study has provided an in-depth evaluation of AI agents' performance in real-world penetration testing, juxtaposed against human cybersecurity professionals. Conducted at a large university network comprising around 8,000 hosts, the research involved testing ten cybersecurity experts and six existing AI systems, including a new framework called ARTEMIS. Remarkably, ARTEMIS ranked second by identifying nine valid vulnerabilities with an 82% submission rate, outperforming nine out of ten human participants. This showcases significant advancements in AI capabilities in automated cybersecurity tasks. The findings are crucial for the AI/ML community as they underscore the potential of AI agents, particularly ARTEMIS, in offering cost-effective and efficient solutions in cybersecurity. While ARTEMIS demonstrated superior performance and comparable quality to top human testers, the study did highlight limitations, such as high false-positive rates and difficulties in GUI-based tasks. With operational costs significantly lower for AI agents (approximately $18/hour for certain ARTEMIS variants compared to $60/hour for human professionals), this research opens new avenues for integrating AI into cybersecurity, emphasizing the balance between leveraging advanced technology and human expertise in protecting against cyber threats.

Loading comments...

loading comments...