Claude Sonnet 4.5 is probably the "best coding model in the world", at least now (simonwillison.net)

🤖 AI Summary
Anthropic today released Claude Sonnet 4.5, which the company bills as the current best coding model and strongest option for building complex agents and “using computers.” In hands‑on testing Sonnet 4.5 executed real development workflows inside Claude.ai’s code interpreter sandbox—cloning GitHub repos, installing PyPI/NPM packages, running pytest—and completed 466 tests in ~168 seconds. In a more ambitious experiment it implemented a tree-structured conversation model (adding parent_response_id), produced a utility module (tree_utils.py) with navigation/analysis/visualization helpers, a 16‑test suite, migration updates and documentation, and reported 22/22 tests passing. Those results underline Sonnet 4.5’s improved tool use, reasoning and math capabilities compared with recent coding models like GPT‑5‑Codex. Technically and commercially this release matters because Sonnet 4.5 couples stronger code generation and reliable tool execution with an agent SDK and broader integrations: it’s already live on OpenRouter, Cursor, GitHub Copilot, and ships with a new Claude Code VS Code extension, terminal app upgrades, and a rebranded Claude Agent SDK (TypeScript + Python). Pricing stays at $3/million input and $15/million output (cheaper than Claude Opus but costlier than GPT‑5), so teams will weigh performance gains against budget. The model’s superior end‑to‑end execution makes it compelling for developer tooling, CI automation, and agent workflows—though competition (e.g., upcoming Gemini 3) could shift the landscape quickly.
Loading comments...
loading comments...