Cognition Releases SWE-1.5: Near-SOTA Coding Performance at 950 tok/s (cognition.ai)

🤖 AI Summary
Cognition today released SWE-1.5, a frontier‑size software‑engineering model (hundreds of billions of parameters) that delivers near–state‑of‑the‑art coding capability while prioritizing latency. Deployed in Windsurf and served with Cerebras inference hardware, SWE‑1.5 runs at up to 950 tokens/sec — roughly 6× faster than Haiku 4.5 and 13× faster than Sonnet 4.5 — enabling interactive coding flows (examples: Kubernetes edits cut from ~20s to under 5s). Cognition emphasizes end‑to‑end agent design rather than just model scale: SWE‑1.5 co‑optimizes the model, inference stack, and agent harness to remove the usual tradeoff between speed and performance. Technically, SWE‑1.5 was post‑trained from a strong open‑source base using reinforcement learning on the Cascade agent harness (a variant of unbiased policy gradient for long multi‑turn traces), trained on a GB200 NVL72 cluster — possibly the first public production model on GB200 silicon — and uses a high‑fidelity training stack (otterlink hypervisor for large‑scale, code‑executing VMs, speculative decoding, and a custom request‑priority system). Cognition also built curated coding environments, multi‑mechanism graders and “reward hardening” to reduce false positives, and reworked Windsurf pipelines (linting, command execution) to shave per‑step overhead by up to 2s. The result is a practical agent that materially improves developer UX and signals a shift toward co‑designing models, inference, and orchestration for real‑world AI assistants.
Loading comments...
loading comments...