AI Is Acing Math Exams Faster Than Scientists Write Them (spectrum.ieee.org)

🤖 AI Summary
Epoch AI's Frontier Math benchmark has highlighted the rapid advancements of AI in mathematics, revealing that models like ChatGPT 5.2 Pro and Claude Opus 4.6 now solve over 40% of its problems, compared to just 2% at its introduction. This significant leap illustrates how AI's capabilities are outpacing current benchmarks, prompting the need for more challenging assessments. Notably, Google DeepMind’s Aletheia achieved publishable results in PhD-level mathematics autonomously, marking a milestone in AI-driven mathematical discovery. To address the evolving landscape, the First Proof challenge was launched, presenting 10 difficult problems that have yet to be solved even by top AI models with limited human supervision. Alongside this, Epoch AI introduced an “Open Problems” initiative, featuring unsolved mathematical problems aimed at testing AI's proficiency in a way that remains relevant to the mathematical community. Both endeavors aim to redefine how mathematical capabilities of AI are assessed, emphasizing the necessity of rigorous, relevant challenges as AI approaches the level of professional mathematicians.
Loading comments...
loading comments...