All in on MatMul? Don’t Put All Your Tensors in One Basket! (www.sigarch.org)

0 points 8 hours ago ago | visit original

🤖 AI Summary

The post warns that AI’s runaway success has been shaped as much by hardware as by ideas: modern accelerators are massively optimized for dense matrix multiplications (MatMuls) and low-precision tensor math, so researchers naturally recast problems as chains of MatMuls to tap decades of compiler and chip optimizations. That “MatMul-reduction” and the Matthew Effect—hardware favoring certain algorithms, those algorithms attracting investment, and then more specialized hardware being built for them—create a de facto hardware lottery. This biases the field toward approaches that run well on today’s xPUs, sidelines alternatives that need different primitives or chips, and concentrates power in those who control silicon and large clusters. Technically, this matters because some promising directions (sparse/event-driven models, conditional tree-based inference, multiplier-free or lookup/add accelerators) don’t map well to dense linear algebra but could offer big algorithmic gains. Evidence of diminishing returns from simply scaling MatMul compute (Kaplan et al.’s small loss improvements) heightens the risk of an innovation monoculture. Remedies range from funding diverse hardware research to making dominant hardware more programmable—examples include Nvidia’s CPU–GPU co-design, Youtube’s VCU tradeoffs, Fast Feedforward Networks, and Stella Nera’s non-MatMul substrates. The pragmatic middle path is co-design: use ML to search microarchitectures and algorithms jointly, and insist that future accelerators preserve enough generality so new algorithmic primitives can actually be evaluated.

Loading comments...

loading comments...