Galton Board Softmax O(N2) Replacement (github.com)

🤖 AI Summary
Galton Lab introduced a novel softmax replacement that treats categorical prediction as learned geometric flow instead of explicit probability calculation. Inspired by a Galton board, the approach drops probe particles into a context-conditioned landscape: in a discrete variant probes bounce through learned pegs, while in a continuous variant probes follow neural velocity fields on a torus integrated with RK2 ODE solvers and neural signed-distance functions. Probes naturally concentrate when the model is confident (fast, cheap prediction) and spread when uncertain (adaptive compute), yielding built-in uncertainty estimates and highly visualizable trajectory-based interpretability. Technically, this can replace costly softmax operations in output layers and attention (where softmax(QK^T) is a dominant O(n^2) cost) by routing probes toward likely tokens and hierarchically scaling to 10k–50k vocabularies. Key components include context → field composers, Gaussian soft-bucket assignments, warm-start presets, an auto-handoff/sharpening schedule, and stochastic/SDE variants for exploration. The project provides PyTorch-ready code, demos, and experiments showing image-classification, attention, and RL use cases. If validated at scale, geometric flow sampling could offer adaptive compute, interpretable decision paths, and a drop-in alternative to softmax across many categorical domains—while raising questions about theoretical guarantees, connections to diffusion/optimal transport, and large-vocabulary efficiency.
Loading comments...
loading comments...