Building AI for cyber defenders (www.anthropic.com)

🤖 AI Summary
Anthropic reports that it has purposefully tuned its Claude family to help cyber defenders, and that the newest Claude Sonnet 4.5 matches or outperforms its recent Opus 4.1 frontier model on practical security tasks. Rather than chasing offensive capabilities, the team focused research on vulnerability discovery, remediation, and simulated infrastructure testing. They argue this work marks an inflection point: large LLMs now provide operationally useful cyber skills that defenders should adopt before attackers do, and they’ve already used the model to find and fix real bugs and to disrupt criminal misuse (including large-scale data extortion and APT-like espionage). The claims are backed by industry benchmarks and experiments. On Cybench (CTF-style challenges) Sonnet 4.5 solves 76.5% of tasks given 10 attempts and halved failure rates compared with models from six months earlier; it solved a complex malware decompilation challenge in 38 minutes. On CyberGym, Sonnet 4.5 set a new SOTA of 28.9% under cost constraints, reproduces 66.7% of known vulnerabilities with 30 trials (≈$45/task), and discovers novel vulnerabilities in over 33% of projects with repeated attempts. Preliminary patching work shows 15% of auto-generated patches are semantically equivalent to human fixes, with more valid but non-matching fixes noted. The takeaway: LLMs exhibit emergent cyber capabilities that scale with repeated probing and low cost, so defenders should integrate and rigorously evaluate AI tooling while continuing safeguards against dual-use abuse.
Loading comments...
loading comments...