DeepSeek releases open-weights math model with IMO gold medal performance (huggingface.co)

🤖 AI Summary
DeepSeekMath-V2 is a new LLM-based system that shifts mathematical AI from optimizing final-answer accuracy to verifying rigorous step-by-step reasoning. The DeepSeek team trained a dedicated verifier model to judge the correctness and completeness of proofs, then used that verifier as a reward model to train a proof generator—encouraging the generator to find and fix errors in its own drafts before finalizing them. To avoid a widening gap as generators improve, they scale verification compute to automatically label hard-to-verify proofs and use those labels to further strengthen the verifier, creating a generation–verification training loop aimed at self-verifiability. The approach addresses a core limitation of prior RL-from-self-consistency methods (which boosted contest scores but don’t guarantee valid derivations) and is tailored to tasks like theorem proving and open problems where final answers alone are insufficient. With scaled test-time verification, DeepSeekMath-V2 achieves gold-level performance on IMO 2025 and CMO 2024 benchmarks and scores 118/120 on Putnam 2024, and it was evaluated on IMO-ProofBench. Built on DeepSeek-V3.2-Exp-Base and released under Apache 2.0, the work suggests self-verifiable reasoning is a feasible path toward more trustworthy, rigorous mathematical AI, though the authors note substantial remaining challenges.
Loading comments...
loading comments...