The Backbone Breaker Benchmark: Testing the Real Security of AI Agents (www.lakera.ai)

🤖 AI Summary
Lakera and the UK AI Security Institute today introduced the Backbone Breaker Benchmark (b3), a human-grounded framework that measures the security of the backbone LLMs that power agentic systems. Rather than testing whole-agent workflows, b3 isolates the single moments—“threat snapshots”—where the core model makes a decision under attack. Built from Gandalf’s sandbox of agentic scenarios and nearly 200,000 human red-team attempts (distilled to the highest-signal 0.1%), b3 includes ten representative threat snapshots (e.g., phishing link insertion, malicious code injection, data exfiltration), three defense levels (baseline, hardened, self-judging), and a reproducible vulnerability score computed by replaying attacks across models. The benchmark yields actionable technical insights: step-by-step (chain-of-thought) reasoning measurably improves resistance to injection attacks (~15% less vulnerable), model size doesn’t guarantee security (mid-sized models can outperform larger ones), and closed-weight commercial models currently lead but open models are rapidly closing the gap. Crucially, b3 separates safety (refusing harmful outputs) from security (resisting manipulation to perform unintended actions). For developers, providers, researchers and CISOs, b3 provides a repeatable, comparable metric and a taxonomy for threat modeling—shifting AI security from subjective safety claims to empirical, backbone-level instrumentation.
Loading comments...
loading comments...