Sorry, but DeepSeek didn't train its flagship model for $294,000 (www.theregister.com)

🤖 AI Summary
DeepSeek’s recent Nature R1 report sparked headlines claiming its flagship model cost just $294,000 to train — but that figure only covers a narrow reinforcement‑learning (RL) fine‑tuning phase, not the heavy lifting of pretraining. The paper documents using 64 eight‑way H800 boxes (512 GPUs) for ~198+80 hours and ~5,000 GPU hours to generate supervised fine‑tuning data, which yields the sub-$300k headline if you assume $2/hr GPU leases. Crucially, DeepSeek V3 — the base model R1 builds on — was pretrained on 2,048 H800s for roughly two months (about 2.79 million GPU hours), an estimated $5.58M in compute. Combining pretraining and RL gives a real compute bill near $5.87M, not $294k. Technically, the R1 work focuses on Group Relative Policy Optimization (GRPO) to instill stepwise reasoning via RL—a post‑training process that refines behavior rather than substituting for base pretraining. Cost estimates also omit hardware purchase (> $51M for the servers by some calculations), R&D, data curation, and sunk development effort. Compared to Meta’s Llama 4 (2.38–5M GPU hours, 22–40T tokens), DeepSeek V3 used similar compute (2.79M hours) but fewer tokens (14.8T), suggesting comparable scale rather than a dramatic efficiency breakthrough. The episode underscores how selective disclosures and lease‑based math can create misleading narratives about model cost and efficiency.
Loading comments...
loading comments...