Show HN: CUDA, Shmuda: Fold Proteins on a MacBook (latentspacecraft.com)

🤖 AI Summary
A developer ported OpenFold3 (an open-source AlphaFold3 replica) to Apple Silicon by replacing CUDA-specific kernels with MLX-optimized equivalents, demonstrating that modern M-series Macs can run large protein-folding models efficiently. MLX is a numpy-like framework built for Apple’s unified memory architecture (single CPU/GPU/NPU pool), avoiding the CPU↔GPU data copies that create bottlenecks in CUDA-centric workflows. The author rewrote CUDA ops such as the triangle-attention kernel to call MLX primitives and shared a drop-in replacement implementation, enabling inference on everyday Macs without renting large GPU instances. This matters because it lowers the hardware barrier for computational biology and ML research: M-series chips deliver near-GPU throughput with far lower power draw and easier hardware access. Reported inference times on an M4 MacBook Air are ~20–30s for small proteins (<200 residues), ~90s for medium (200–400), and ~3min for larger proteins (400+), not counting model load time. Practical caveats include the need for MLX ≥0.5.0 (earlier versions have activation-function bugs), some manual porting of CUDA kernels, and that this is currently a beta effort. Still, the work signals a shift: the bottleneck is increasingly software support for non-CUDA platforms rather than raw hardware availability, opening a path to more energy-efficient, democratized protein folding on consumer devices.
Loading comments...
loading comments...