The Post-Transformer Edge: LFM 2.5 vs. MT-LNN (AwareLiquid) (github.com)

🤖 AI Summary
A new brain-inspired large language model (LLM) architecture, MT-LNN, has been announced as a significant innovation over traditional Transformer blocks. This architecture replaces the feed-forward network (FFN) with a recurrent Microtubule Liquid Neural Network layer, integrating innovative features like a Global Workspace Theory-inspired workspace bottleneck and a Global Coherence sparse top-$k$ collapse gate. These changes aim to address key limitations in current LLMs, particularly regarding memory efficiency, compute utilization, and world-model representation, allowing for dynamic neuron activation and optional predictive state tracking. The implications of MT-LNN for the AI/ML community are substantial, demonstrating improved performance with minimal added parameters. In benchmarks, models using MT-LNN showcased significant cross-base perplexity (PPL) decreases while maintaining the original architecture's forward pass, facilitating ease of integration. For example, TinyLlama-1.1B and Qwen-2.5 showed PPL drops of up to 34.4% with only 0.1-0.2% of trainable parameters. This approach enhances long-context handling and inference efficiency, representing a promising direction for future advancements in more capable and resource-efficient AI systems.
Loading comments...
loading comments...