Huawei post-trained DeepSeek's 1.6T model on 1k Ascend 910C chips (www.tomshardware.com)

0 points 1 day ago ago | visit original

🤖 AI Summary

Huawei Technologies has announced the successful full-parameter post-training of DeepSeek's V4-Pro, a massive 1.6-trillion-parameter model, utilizing a cluster of 1,000 Ascend 910C chips. This achievement is significant for the AI/ML community, as it demonstrates the capability of Chinese chips to manage training-class workloads, a critical advancement given the restrictions on U.S. technology exports. Previously, Chinese firms struggled to move away from Nvidia hardware, particularly for training tasks, which require extensive computational resources. The post-training process, which updates all model weights rather than just adding an adapter layer, is a pivotal phase that fine-tunes model behavior for specific tasks. Although this accomplishment marks a notable step for Huawei's Ascend platform, it does not prove the chips' ability to pre-train a model from scratch, a more substantial and resource-intensive task. The announcement, however, lacks essential technical benchmarks and comparison metrics to Nvidia hardware, raising skepticism about the claims made, especially given past difficulties experienced by DeepSeek with these chips. This situation highlights ongoing challenges and developments in the competitive landscape of AI hardware.

Loading comments...

loading comments...