Phishing Arena – multi-agent LLM tournament to study adversarial email security (github.com)

0 points 53 days ago ago | visit original

🤖 AI Summary

The Phishing Arena, a novel benchmark designed for adversarial email security research, has been launched, featuring a multi-agent tournament where four commercial large language models (LLMs) compete in roles of Phisher, Filter, and Target. The tournament consists of 48 matches that explore different interaction dynamics in Italian, with the Phisher agent employing a sophisticated CampaignMemory feedback loop to adapt its tactics based on outcomes from previous rounds. This system offers valuable insights into how phishing techniques can bypass filters, revealing that a staggering 79% of successful phishing attempts lacked identifiable evasion techniques, succeeding instead through persuasive contextual plausibility. This study holds significant implications for the AI/ML community, particularly in security applications, as it provides a structured environment for assessing the robustness of email filters against sophisticated phishing attacks. The standout models included in the competition, such as gpt-5.4-mini and claude-sonnet-4-6, demonstrated varied performance metrics that will inform future improvements in AI-driven security measures. Researchers and developers are encouraged to reproduce the tournament's findings to further enhance protective mechanisms against email-based threats, ultimately contributing to stronger cybersecurity frameworks.

Loading comments...

loading comments...