MiniMax M2.5 released: 80.2% in SWE-bench Verified (www.minimax.io)

0 points 5 hours ago ago | visit original

🤖 AI Summary

The release of MiniMax M2.5 marks a significant advancement in AI capabilities, specifically in coding, office work, and agent-based tasks. With state-of-the-art (SOTA) performance in the SWE-Bench Verified benchmark at 80.2%, and notable improvements in efficiency, M2.5 is designed for real-world productivity. It completes tasks 37% faster than its predecessor, M2.1, and matches the speed of the Claude Opus 4.6 model while significantly reducing operational costs—only $1 per hour at 100 tokens per second. This affordability, combined with its ability to handle complex programming tasks across multiple languages and domains, positions M2.5 as a game-changer for developers and enterprises looking to leverage AI efficiently. Technically, M2.5 incorporates advanced reinforcement learning principles and a hybrid training framework called Forge, which enhances its generalization across various agentic tasks. Its architectural refinement allows for efficient task decomposition and planning, resembling the methods of experienced software architects. Moreover, it excels in multi-agent collaborations and office productivity applications, demonstrated by its integration of domain-specific skills for generating high-quality deliverables. The extended training that encompasses hundreds of thousands of real-world environments solidifies MiniMax M2.5’s foundation as a critical tool in advancing AI's role in complex and economically valuable tasks, potentially revolutionizing how companies approach automation and task execution.

Loading comments...

loading comments...