The Shifting Global Compute Landscape (huggingface.co)

🤖 AI Summary
China’s once-U.S.-centric AI compute stack is rapidly diversifying: open-weight models from labs like DeepSeek, Qwen (Alibaba), GLM and Kimi are now running inference — and in some cases training — on domestic accelerators such as Huawei’s Ascend, Cambricon and Baidu’s Kunlun. U.S. export controls on high-end GPUs appear to have accelerated China’s push for self-sufficiency, provoking a full‑stack response that pairs chip design, software alternatives to CUDA, and open-source model releases. The result is lower inference costs, a burst of compute-efficient research, and growing interoperability between models and local hardware (e.g., DeepSeek-V3.2 day‑zero support for Ascend/Cambricon; Baidu’s 5,000+ Kunlun pretraining cluster; Ant Group’s heterogeneous runs). Technically, scarcity drove architectural and systems innovation: compute-saving methods such as Multi‑head Latent Attention (MLA), DeepSeek Sparse Attention (DSA), Group Relative Policy Optimization (GRPO), and renewed interest in linear-attention RWKV variants reduce memory and runtime for large models. Engineering open‑infrastructure advances (Mooncake serving, Attention‑FFN disaggregation, Volcengine’s verl) are making production-scale training and inference more hardware-agnostic. For researchers and policymakers this means a shifting competitive landscape — more global openness and cheaper deployment, but also hardware fragmentation and geopolitical decoupling that will reshape where and how large models are built, optimized, and governed.
Loading comments...
loading comments...