DeepSeek's new models are so efficient they'll run on a toaster by which we mean (www.theregister.com)

0 points 1 hour ago ago | visit original

🤖 AI Summary

DeepSeek has unveiled its latest large language model, DeepSeek V4, which promises to significantly reduce inference costs and is optimized to run on Huawei's AI accelerators. Available for preview, this model comes in two versions: a compact 284 billion parameter Flash MoE model and a larger 1.6 trillion parameter variant. Both models claim to rival top proprietary options, with the V4-Pro trained on an impressive 33 trillion tokens. DeepSeek asserts that V4 performs exceptionally well in benchmarks and is designed to be much more efficient in real-world applications. Key technical advancements in DeepSeek V4 include a hybrid attention mechanism that uses Compressed Sparse Attention and Heavy Compressed Attention, reducing the computational and memory footprint significantly. The model is designed to support a million-token context window while using up to 13.7 times less memory than its predecessor. Additionally, it employs a mixture of FP8 and FP4 precision to optimize memory usage. The introduction of the Muon optimizer aims to enhance training stability, while the model's compatibility with both Nvidia and Huawei hardware marks a notable shift in its deployment strategy. DeepSeek V4 is not only cost-effective but also positions itself as a serious contender in the evolving AI landscape.

Loading comments...

loading comments...