🤖 AI Summary
Weibo released VibeThinker-1.5B, an open-source 1.5-billion-parameter dense language model that claims large-model reasoning performance at extreme parameter efficiency — trained for just $7,800. On competitive math benchmarks it posts AIME24/AIME25/HMMT25 scores of 80.3 / 74.4 / 50.4, outperforming DeepSeek R1 (a 400× larger model) and extending the Pareto frontier of reasoning accuracy vs. model scale. It also scores strongly on code generation (LiveCodeBench v5 55.9, v6 51.1), slightly beating comparable “medium” models. The team recommends the model for competitive-style math and coding problems and provides an MIT-licensed repo, an arXiv citation, and a public evaluation scheme for rapid community verification.
The core technical contribution is the Spectrum-to-Signal Principle (SSP) training framework: SFT first encourages diverse solution exploration, then an RL stage optimizes the policy to reinforce correct signals — explicitly making diversity a design principle to elicit robust reasoning. Practical notes: the repo requires transformers>=4.54.0 and suggests vLLM==0.10.1 or SGLang>=0.4.9.post6 for inference. Example inference uses AutoModelForCausalLM with bfloat16/device_map="auto" and recommended generation settings temperature 0.6 or 1.0, max_new_tokens 40960, top_p 0.95, top_k=-1 (for vLLM/SGLang).
Loading comments...
login to comment
loading comments...
no comments yet