Secrets of DeepSeek AI model revealed in landmark paper (www.nature.com)

🤖 AI Summary
DeepSeek’s R1 — an open-weight LLM optimized for reasoning tasks like mathematics and coding — is described in a landmark, peer‑reviewed Nature paper that discloses previously secret details about its design, training cost and influence. R1 has been downloaded 10.9 million times on Hugging Face. DeepSeek reports spending about $6M to create the base LLM and an additional ~$294k to train the R1 augmentation, far lower than the tens of millions typically reported for rival models. The team trained mainly on Nvidia H800 GPUs (hardware that was banned from sale to China in 2023). In response to reviewers they reduced anthropomorphic language and clarified training data and safety steps. Technically, R1’s big innovation is using “pure” reinforcement learning: the model was rewarded for reaching correct answers rather than being taught via human-curated chain‑of‑thought examples. It also used a self‑scoring scheme—group relative policy optimization—where the model estimates scores for its own attempts instead of relying on a separate evaluator, improving efficiency. The paper’s peer review and public disclosure set a transparency precedent that helps external risk assessment, and R1’s methods have strongly influenced 2025 work on RL for LLMs. The combination of open weights, low-cost training and novel RL techniques makes R1 both a practical alternative to big‑budget models and a catalyst for community scrutiny and innovation.
Loading comments...
loading comments...