Introducing Claude Sonnet 4.5 (www.anthropic.com)

🤖 AI Summary
Anthropic today released Claude Sonnet 4.5, a frontier coding and agent model that the company positions as its strongest at software development, long-horizon reasoning, and real-world computer use. Sonnet 4.5 is available now via the Claude API (claude-sonnet-4-5) at the same $3/$15 per‑million‑token pricing as Sonnet 4, and ships with product upgrades—Claude Code checkpoints, a refreshed terminal, a native VS Code extension, in‑chat code execution and file creation, a memory/context-editing API, and a public Claude Agent SDK so developers can build agentic systems with the same infrastructure behind Claude Code. Technically, Sonnet 4.5 shows large gains on benchmarks and in customer workflows: SWE-bench Verified scores (~77.2% with a 200K thinking budget, 78.2% at 1M context, up to 82% with high‑compute selection), OSWorld computer-task accuracy jumped to 61.4% (from 42.2 four months prior), and internal code-edit error rates dropped from 9% to 0%. The model sustains multi-step agentic work for 30+ hours, improves vulnerability triage (44% faster, 25% more accurate), and boosts multi‑step reasoning in tools like GitHub Copilot, Figma, and Canva. Sonnet 4.5 is released under ASL‑3 safety controls with improved classifiers (fewer false positives) and alignment training; its system card includes new safety evaluations leveraging mechanistic interpretability. For the AI/ML community, this release signals a step forward in agentic tool use, long-context programmatic workflows, and deployable agent infrastructure—while also highlighting ongoing trade‑offs around safety filters, evaluation compute regimes, and prompt‑injection defenses.
Loading comments...
loading comments...