GLM5.2 on AMD MI355X at 2626 tok/s/node at over 2x lower cost than Blackwell (www.wafer.ai)

0 points 3 hours ago ago | visit original

🤖 AI Summary

AMD's MI355X GPUs are making waves in the AI/ML community, demonstrating impressive performance at a lower cost compared to NVIDIA's Blackwell architecture. In recent tests, the MI355X achieved an aggregate throughput of 2626 tokens per second (tok/s) per node, outperforming NVIDIA’s offerings while being over two times cheaper. This marks a significant development in the ongoing quest for cost-effective inference solutions amid soaring demand for AI models, positioning AMD as a viable competitor despite its struggles with software support and optimization challenges. The key innovation in this achievement lies in optimizing the GLM5.2 model with AMD's quantization techniques and the sglang inference framework, resulting in enhanced performance without the need for custom kernel development. The benchmarks revealed that with fine-tuning using MXFP4 quantization and speculative decoding techniques, the MI355X could deliver compelling results, particularly for specific workloads. This breakthrough not only highlights AMD's potential to bridge the performance gap with NVIDIA but also indicates a shift in the competitive landscape where the traditional advantages of proprietary software ecosystems are being challenged by increasingly capable open-source alternatives. The implications are profound, suggesting a future where AMD could serve as a mainstream choice for AI deployments, particularly as software support continues to improve.

Loading comments...

loading comments...