MathNet:30k competition math problems for AI mathematical reasoning benchmarking (mathnet.mit.edu)

0 points 56 days ago ago | visit original

🤖 AI Summary

The recent launch of MathNet marks a significant advancement in the field of AI mathematical reasoning, introducing a comprehensive dataset of 30,676 Olympiad-level math problems sourced from 47 countries and 17 languages over two decades. Designed to address the limitations of existing benchmarks, MathNet provides a rich, multilingual, and multimodal framework for evaluating the performance of generative models and mathematical retrieval systems. The initiative includes a benchmark for three distinct tasks: outright problem-solving, math-aware retrieval, and retrieval-augmented problem solving. This dataset not only challenges state-of-the-art models, which scored 78.4% accuracy (Gemini-3.1-Pro) on the problem-solving task but also highlights the difficulties in retrieval tasks, where models failed to exceed a Recall@1 rate of 5%. The rigorous process of data preparation—utilizing OCR for scanning competition booklets, normalizing formatting, and expert verification—ensures high-quality problem-solution pairs are available for benchmarking. The results underscore a critical gap in retrieval performance, emphasizing that the effectiveness of context provided during problem-solving heavily relies on the quality of the retrieved information. MathNet is positioned to substantially influence ongoing research and development in AI/ML mathematical reasoning, as it offers a vital resource for training and evaluating models in a previously underserved domain.

Loading comments...

loading comments...