MiniMax-M3: A native multimodal model with 1M context (huggingface.co)

0 points 1 hour ago ago | visit original

🤖 AI Summary

MiniMax-M3 has been unveiled as a groundbreaking native multimodal model featuring 1 million tokens of context, making significant strides in the AI/ML landscape. With approximately 428 billion parameters and 23 billion activated parameters, M3 is designed to seamlessly integrate text, image, and video inputs right from the outset, enhancing the depth of semantic fusion. This innovation is particularly relevant given the increasing demand for AI systems that can process and understand multiple forms of data simultaneously. The model introduces MiniMax Sparse Attention (MSA), which offers substantial improvements to context efficiency, achieving up to 9× faster prefill and 15× quicker decoding compared to its predecessor, M2. This not only cuts down per-token compute to 1/20 but also lowers the memory footprint—vital for handling complex tasks while maintaining model quality. M3 supports two reasoning modes tailored for different applications: a 'thinking' mode for intricate reasoning and collaborative tasks, and a 'non-thinking' mode optimized for low-latency applications like chat and code completion. This versatility positions MiniMax-M3 as a formidable tool for a varied range of AI-driven solutions.

Loading comments...

loading comments...