🤖 AI Summary
China’s once-U.S.-centric AI compute stack is rapidly diversifying: open-weight models from labs like DeepSeek, Qwen (Alibaba), GLM and Kimi are now running inference — and in some cases training — on domestic accelerators such as Huawei’s Ascend, Cambricon and Baidu’s Kunlun. U.S. export controls on high-end GPUs appear to have accelerated China’s push for self-sufficiency, provoking a full‑stack response that pairs chip design, software alternatives to CUDA, and open-source model releases. The result is lower inference costs, a burst of compute-efficient research, and growing interoperability between models and local hardware (e.g., DeepSeek-V3.2 day‑zero support for Ascend/Cambricon; Baidu’s 5,000+ Kunlun pretraining cluster; Ant Group’s heterogeneous runs).
Technically, scarcity drove architectural and systems innovation: compute-saving methods such as Multi‑head Latent Attention (MLA), DeepSeek Sparse Attention (DSA), Group Relative Policy Optimization (GRPO), and renewed interest in linear-attention RWKV variants reduce memory and runtime for large models. Engineering open‑infrastructure advances (Mooncake serving, Attention‑FFN disaggregation, Volcengine’s verl) are making production-scale training and inference more hardware-agnostic. For researchers and policymakers this means a shifting competitive landscape — more global openness and cheaper deployment, but also hardware fragmentation and geopolitical decoupling that will reshape where and how large models are built, optimized, and governed.
Loading comments...
login to comment
loading comments...
no comments yet