China's DeepSeek launches next-gen AI model. Here's what makes it different (www.cnbc.com)

0 points 2 hours ago ago | visit original

🤖 AI Summary

Chinese startup DeepSeek released an experimental build of its model—DeepSeek‑V3.2‑Exp—claiming major efficiency gains via a new DSA (DeepSeek Sparse Attention) mechanism that halves inference cost and improves handling of long documents and conversations compared with its V3.1‑Terminus. The company open‑sourced the code and says the model runs “out of the box” on domestic AI chips like Huawei Ascend and Cambricon, continuing a pattern (from last year’s surprise R1 release) of training and running capable LLMs on less powerful hardware and fewer resources. For the AI community this matters because efficiency is shifting from raw scale to practical deployability: cheaper, faster models broaden access for developers, researchers and smaller firms and accelerate real‑world applications that require long‑context reasoning. However, DSA’s selective attention brings tradeoffs—sparse attention reduces computation by ignoring many tokens, which can boost speed but risks dropping nuance or excluding crucial signals, raising safety and inclusivity concerns. Observers also note sparse attention isn’t novel and open sourcing limits defensibility, so DeepSeek’s advantage will hinge on how well its selection mechanism preserves important information. DeepSeek frames V3.2‑Exp as an intermediate step toward a next‑gen architecture amid ongoing US‑China competition in AI.

Loading comments...

loading comments...