Apple Silicon GPU Support in Mojo (forum.modular.com)

🤖 AI Summary
Mojo’s latest nightlies (and the upcoming stable) add initial support for Apple Silicon GPUs, enabling developers with M1–M4 Macs to compile and run Mojo GPU code locally. This lowers the barrier to GPU programming by unlocking the GPU in every modern Mac, which could accelerate local experimentation, model prototyping, and local-to-cloud workflows. To try it you need macOS 15+ and Xcode 16+ (Metal Shading Language 3.2 / AIR bitcode 2.7.0); clone the modular repo and run examples/mojo/gpu-functions (all but reduction.mojo) or Mojo GPU puzzles 1–15. Note Pixi hasn’t been updated yet, so run puzzles manually for now. Under the hood Mojo lowers GPU functions to LLVM IR then to Apple Intermediate Representation (AIR) bitcode, and a MetalDeviceContext (specialized DeviceContext) uses Metal-cpp to produce a .metallib and manage command queues/buffers—transparent to the developer. Many features remain incomplete: hardware intrinsics, atomic ops, async_copy_*, GridDim/lane_id, array-to-pointer conversion, SubBuffer, bfloat16 on ARM, MAX graphs/custom ops, PyTorch interoperability and model serving. Consequently simple MAX graphs and AI models don’t yet run and accelerator_count() will report 0 until basic MAX support lands. The team has a prioritized list of blockers and invites contributions, but some infra changes initially require Modular developer work before the community can fully extend Apple Silicon support.
Loading comments...
loading comments...