VibeThinker-1.5B (github.com)

🤖 AI Summary
VibeThinker-1.5B is a newly open-sourced 1.5B-parameter dense model that claims to overturn the assumption that small models cannot achieve strong reasoning. Using a post-training pipeline built around the authors’ “Spectrum-to-Signal Principle (SSP),” the team applies a Two-Stage Diversity-Exploring Distillation during SFT to produce a broad solution spectrum, then a MaxEnt-Guided Policy Optimization (MGPO) RL phase to amplify correct outputs. Empirically, VibeThinker matches or exceeds much larger models on hard math benchmarks: AIME24 (80.3 vs 79.8 for DeepSeek R1), AIME25 (74.4 vs 70.0), and HMMT25 (50.4 vs 41.7), while performing competitively with GPT-OSS-20B Medium and better than closed-source Magistral Medium and Claude Opus 4. The model is 100×–600× smaller than mega-models (e.g., Kimi K2, DeepSeek R1) and reportedly cost only $7.8K to post-train versus $294K–$535K for comparable large models. Significance: VibeThinker demonstrates that diversity-driven post-training can elicit large-model–like reasoning in a fraction of the parameters and cost, potentially lowering the barrier for research and deployment of high-performance reasoning models. Key technical implications include the effectiveness of generating diverse candidate solutions then optimizing for signal clarity (MGPO), and practical inference settings (temperature 0.6–1.0, max_new_tokens 40960, top_p 0.95, top_k=-1). The authors released weights, code, and a technical report (MIT license), enabling community verification, though broader replication across tasks and robustness checks will be important to validate generality.
Loading comments...
loading comments...