The AI Performance Benefit with AMX on Intel Xeon 6 "Granite Rapids" (www.phoronix.com)

🤖 AI Summary
Phoronix ran fresh AI inference benchmarks on dual Intel Xeon 6980P "Granite Rapids" CPUs to quantify the real-world benefit of Advanced Matrix Extensions (AMX). Using a Giga Computing R284-A92-AAL barebones server with 24×64 GB MRDIMM-8800 modules, Ubuntu 25.10 and Linux 6.17, the tests compared identical AI workloads (OpenVINO and llama.cpp via oneDNN) with AMX enabled versus disabled. The results show a substantial performance uplift for inference when AMX is active, underscoring why AMX remains a key accelerator for matrix-heavy ML workloads on Xeon 6 even two years after its debut on Sapphire Rapids. The article also highlights practical and technical details important to ML engineers: AMX isn’t toggleable via a simple BIOS switch, but can be suppressed for testing by setting ONEDNN_MAX_CPU_ISA="AVX512_CORE_FP16" (to force oneDNN away from AMX) or by hiding CPU features with clearcpuid=598,600,601 on the kernel command line. Power draw and CPU temperatures were monitored to check for AVX‑512–style thermal/power impacts, an important consideration for datacenter deployments. Bottom line: AMX delivers significant inference gains on Granite Rapids, and as software support (OpenVINO, llama.cpp, oneDNN) expands, AMX will be increasingly important for cost- and power-efficient CPU-based AI inference.
Loading comments...
loading comments...