Raven: Memory as a Set of Slots (goombalab.github.io)

🤖 AI Summary
Raven, a new model introduced in the AI/ML community, addresses the challenge of memory organization in recurrent models by employing a unique mechanism called a learned sparse router. This innovation allows Raven to store and retrieve information efficiently, functioning similarly to how ravens manage to cache and remember food locations with remarkable precision. The model operates on the Needle-In-The-Haystack (NIAH) benchmark, achieving a constant-memory recall of $O(1)$, significantly improving on the limitations of traditional recurrent and Transformer models that suffer from memory interference during multiple-choice tasks. The significance of Raven lies in its dual capability to both update and preserve memory contents without interference. By treating its hidden state like a collection of independent slots—each managed by the router—Raven can selectively modify only the relevant parts of its memory while keeping the rest intact. This breakthrough enhances performance in tasks requiring memory recall well beyond training lengths, with preliminary evaluations suggesting up to 16 times greater recall capacity. As Raven moves from theory to practical implementation, its architecture promises to reshape how memory is utilized in sequence models, merging the strengths of state space models and sliding window attention approaches.
Loading comments...
loading comments...