Chinese AI Model DeepSeek Revealed in Landmark Paper (www.yahoo.com)

🤖 AI Summary
Chinese firm DeepSeek published a peer‑reviewed Nature paper revealing technical and cost details of R1, its influential open‑weight reasoning model that surged in popularity after a January release. R1—designed for hard reasoning tasks like mathematics and coding—has been downloaded 10.9 million times and, DeepSeek says, was not trained on the outputs of rival models. The paper discloses a lean training tab: about $294k for the R1 augmentation on top of roughly $6M for the base LLM, and that training mainly used Nvidia H800 accelerators (hardware restricted from sale to China in 2023). The paper’s peer review and clarifications set a rarity in large‑model transparency and helped counter speculative claims that R1 relied on copying other models’ reasoning traces. Technically, DeepSeek’s key innovation was a largely automated pure reinforcement‑learning pipeline that rewards correct answers rather than imitating human reasoning examples; the model also self‑scores attempts using estimated returns (group relative policy optimization) instead of separate evaluators. That recipe appears reproducible—labs report similar gains—and offers a much cheaper path to high reasoning performance, explaining R1’s outsized influence on 2025 RL‑in‑LLM work. Implications are broad: democratization of advanced LLMs, new safety and provenance concerns since base training included web data (and any AI content therein), and a push for more peer review and transparency in evaluating model risks.
Loading comments...
loading comments...