Learned Structures (nonint.com)

🤖 AI Summary
The author argues that beyond numerical-stability tweaks (normalization, initialization, smooth activations), the most impactful architectural innovations are "learnable structures"—ways of imposing and letting structured representations interact so the model can learn in stages. Examples: MLPs let elements of a vector mix via learned weights; attention adds a set-aspect allowing elements to attend to each other; Mixture-of-Experts (MoE) adds dynamic selection of weights. Each of these introduces a new axis by which activations influence computation, effectively nesting structures that can unlock higher expressivity as training progresses. The key technical insight is about training dynamics: models optimize their entire parameter space from step one, so sophisticated mechanisms (attention, routing) are ineffective early when activations are noisy and only become useful later as simple patterns are learned. That staged "coming online" explains why these structures boost capacity without wasting early learning. This suggests a productive research direction—designing new learnable structures or compositional systems (but not merely rehashing existing ones). As a concrete idea, the author proposes a "mixture of StyleGANs": train specialized StyleGANs for narrow image modes and an image-composer that routes which generator to use per region, analogous to MoE routing, to improve fidelity on diverse datasets. Overall, learned structures offer a principled path to scale expressivity while aligning with real training dynamics.
Loading comments...
loading comments...