Xiaomi MiMo-V2-Flash Model (github.com)

🤖 AI Summary
Xiaomi has announced the MiMo-V2-Flash, an advanced Mixture-of-Experts (MoE) language model boasting 309 billion total parameters, with 15 billion activated at any point. This model significantly enhances both reasoning capabilities and efficiency in AI workflows by implementing a hybrid attention architecture that combines Sliding Window Attention and Global Attention in a 5:1 ratio, allowing for high performance while reducing inference costs dramatically. Notably, the model supports context lengths of up to 256,000 tokens, making it capable of handling complex reasoning tasks with greater ease. The MiMo-V2-Flash's integration of Multi-Token Prediction (MTP) accelerates output speed and enhances the model's ability to train efficiently in reinforcement learning environments. Its innovative post-training methods, including Multi-Teacher On-Policy Distillation (MOPD), provide a unique feedback mechanism that improves learning robustness. This approach not only boosts performance across standard benchmarks but also enhances generalization across various domains, including coding tasks. With open-sourced MTP weights, Xiaomi is fostering further research and development in the AI community, marking a significant step forward in achieving state-of-the-art performance in language modeling.
Loading comments...
loading comments...