Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Pytho (arxiv.org)

0 points 6 hours ago ago | visit original

🤖 AI Summary

Researchers evaluated Mojo — the first language built on LLVM’s MLIR — as a performance‑portable option for GPU-accelerated scientific kernels within the Python ecosystem. Mojo combines Python interoperability with a CUDA-like, compile‑time GPU programming model and aims to bridge productivity and performance gaps. The team ported four representative workloads (a seven‑point stencil and BabelStream memory‑bound kernels; miniBUDE compute‑bound; and a Hartree‑Fock kernel that is compute‑bound with atomics) and compared Mojo-generated code against vendor baselines on NVIDIA H100 and AMD MI300A GPUs. Results show Mojo delivers competitive performance with CUDA/HIP on memory‑bound kernels, demonstrating MLIR’s promise for portable, high‑efficiency data movement patterns. However, gaps remain: Mojo underperforms on AMD for atomic‑heavy Hartree‑Fock code and shows deficits on fast‑math compute‑bound kernels on both vendors, indicating backend and codegen maturity issues. The language still requires relatively low‑level GPU programming, but its MLIR foundation and Python friendliness position it as a compelling route to reduce fragmentation between scientific computing and AI stacks — provided continued work on vendor backends, atomics handling, and fast‑math optimizations improves.

Loading comments...

loading comments...