M^2RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling (arxiv.org)

🤖 AI Summary
Researchers have introduced the Matrix-to-Matrix RNN (M²RNN), a novel architecture that utilizes matrix-valued hidden states and non-linear state transitions to enhance language modeling capabilities. This approach addresses the inherent limitations of Transformers, which are constrained to TC₀ complexity and cannot efficiently handle tasks such as entity tracking and code execution. The M²RNN architecture demonstrates that expanding the size of hidden states can significantly improve performance, allowing for efficient tensor core utilization and perfect generalization of state tracking at previously unseen sequence lengths. The significance of M²RNN for the AI/ML community lies in its ability to outperform traditional models while maintaining a smaller state size. In hybrid configurations combining recurrent layers with attention mechanisms, M²RNN achieves up to 0.5 perplexity points better performance than comparable models, and can even enhance models by simply replacing a single recurrent layer. Moreover, it shows superior long-context generalization by outperforming state-of-the-art linear attention architectures on benchmarks such as LongBench. These findings position non-linear RNNs as powerful components for future scalable language models, paving the way for advancements in natural language processing tasks.
Loading comments...
loading comments...