MLPerf Inference v5.1 Results Land with New Benchmarks and Record Participation (www.hpcwire.com)

0 points 17 hours ago ago | visit original

🤖 AI Summary

MLCommons has released the latest MLPerf Inference v5.1 benchmark results, highlighting significant advances in AI inference speed and efficiency across a record 27 participants. This update introduces three new benchmarks—including reasoning tasks with the DeepSeek-R1 mixture-of-experts model, speech-to-text using Whisper Large v3, and a small LLM benchmark built around Llama 3.1 8B—reflecting the community’s push toward more diverse and realistic AI workloads. The suite now boasts 90,000 results, reinforcing its role as a key industry standard for evaluating inference performance. Nvidia led the pack with its new Blackwell Ultra architecture powering the GB300 NVL72 system, achieving record throughput on reasoning and LLM benchmarks with up to 5x improvements over prior generations. Nvidia’s innovative disaggregated serving approach combined with the Dynamo inference framework boosted efficiency for latency-sensitive tasks like interactive Llama 3.1 405B workloads. The company also announced Rubin CPX, a next-generation inference chip planned for 2026, targeting massive token contexts for video and AI-assisted development. Meanwhile, AMD made notable strides with its freshly launched Instinct MI355X GPU, delivering strong scalability and record gains using FP4 precision and structured pruning techniques for better throughput with large models. AMD’s expanded submissions covered LLMs, mixture-of-experts models, and generative image tasks, demonstrating versatility across AI domains. MLPerf v5.1 also marks an important expansion of its contributor base, with the University of Florida submitting HPC-based results and an individual researcher demonstrating competitive edge-class inference on a MacBook Pro, underscoring broadening access to rigorous AI benchmarking. The new reasoning benchmark sets a precedent for evaluating multi-step problem-solving LLMs, while the upgraded small LLM test addresses real-world scenarios requiring low-latency, cost-effective deployments. Together, these results underline rapid progress in AI inference technology, driving both hardware innovation and benchmark relevance to meet evolving AI workloads.

Loading comments...

loading comments...