Why a decades old architecture decision is impeding the power of AI computing (research.ibm.com)

🤖 AI Summary
AI’s energy and latency problems aren’t just about scale—they stem from a six-decade-old design choice: the von Neumann architecture, which separates memory and compute and forces constant data movement across a bus. For modern deep learning, where models contain billions of largely static weights and operations are simple matrix-vector multiplies, most energy is spent transferring weights, not computing (compute is roughly 10% of runtime cost). Longer interconnects mean more energy and latency because charging/discharging wires scales with distance, so GPUs can sit idle waiting for data. IBM Research highlights this bottleneck and is developing alternatives—co‑packaged polymer optical waveguides to bring optics to chip edges, and the AIU family (including the NorthPole) that radically localizes memory—to cut transfer energy and speed up training/inference. Technical workarounds include near-memory and in-memory computing. Near-memory AIU NorthPole uses many cores with local SRAM and in one test ran a 3B-parameter LLM 47× faster than the next most energy-efficient GPU and 73× more energy efficient than the lowest-latency GPU. Analog in-memory approaches like phase-change memory (PCM) store weights in chalcogenide resistivity to eliminate transfers, though PCM has limited rewrite durability and is therefore suited to deployment (inference) for pre-trained models. Because von Neumann remains flexible and supports high-precision and general-purpose workloads, the likely future is heterogeneous systems combining von Neumann and non-von Neumann accelerators optimized for their respective strengths.
Loading comments...
loading comments...