DeepSeek Sparse Attention (github.com)

0 points 1 hour ago ago | visit original

🤖 AI Summary

DeepSeek has announced the integration of Sparse Attention (DSA) in its latest update, DeepSeek V3.2, enhancing the efficiency of causal self-attention mechanisms in neural models. Traditional self-attention mechanisms operate at O(L²) complexity, where L is the sequence length, leading to substantial compute and memory costs. The DSA approach refines this by using a learned selection mechanism that allows each query token to attend to only the most relevant subset of past tokens, thereby significantly reducing computational demands to O(L·k), where k is the number of tokens attended to. This design improves performance, especially for long-context applications. The novel architecture involves two key components: the Lightning Indexer, which scores potential past tokens based on relevance using lightweight multi-head attention, and the Token Selector that retains only the top-k relevant tokens. This efficient token selection not only simplifies the attention mechanism but allows for easier inspection and tuning. The practical implications of DSA are considerable, as it can lower the inference costs associated with long-context processing, making it a crucial development for the AI/ML community focused on optimization and scalability in transformer-based models.

Loading comments...

loading comments...