The last six months in LLMs in five minutes (simonwillison.net)

🤖 AI Summary
In a recent lightning talk at PyCon US 2026, an overview of the rapid advancements in large language models (LLMs) over the past six months highlighted significant shifts in performance and capabilities. Notably, the title of the "best" model changed hands multiple times among top providers, with Claude Sonnet 4.5, Opus 4.5, and Gemini 3 each being recognized for their unique strengths, particularly in drawing complex images like a pelican riding a bicycle—a humorous benchmark chosen for its absurdity. A key takeaway was the remarkable improvement in coding agents, which evolved from producing inconsistent outputs to becoming reliable tools for real-world coding tasks following extensive reinforcement learning efforts. The implications of these developments are substantial for the AI/ML community. The enhanced coding agents can now serve as day-to-day assistants for developers, significantly reducing the need for human intervention in debugging and error correction. Additionally, the introduction of high-performing models that can run on standard laptops, such as the Qwen 3.6-35B-A3B, indicates a growing democratization of AI technology, allowing more practitioners to innovate and experiment with powerful tools without needing extensive resources. This wide accessibility will likely spur further creativity and exploration in AI applications, making these recent months pivotal in the ever-evolving landscape of LLMs.
Loading comments...
loading comments...