I unified convolution and attention into a single framework (zenodo.org)

🤖 AI Summary
An independent researcher proposes the Generalized Windowed Operation (GWO), a unifying framework that brings convolution, attention and other neural primitives under a single grammar. GWO decomposes any neural operation into three orthogonal components—Path (operational locality), Shape (geometric structure and symmetry assumptions), and Weight (feature importance)—so different instantiations (e.g., convolutions, self-attention, sparse kernels) become parameterized choices in a shared design space. The paper elevates this decomposition into a predictive theory via a Principle of Structural Alignment: architectures generalize best when their (P, S, W) configuration mirrors the intrinsic structure of the data. Technically, the work links this principle to the Information Bottleneck and formalizes an Operational Complexity measure using Kolmogorov complexity. Crucially, the claim is not that lower complexity always wins, but that complexity which enables adaptive alignment with data (adaptive regularization) yields better generalization than brute‑force capacity. Canonical operations emerge as IB‑optimal solutions within GWO, and experiments support that the quality—rather than sheer quantity—of an operation’s complexity governs performance. Practically, GWO offers a principled roadmap for designing new operators, guiding architecture search, and explaining when to favor convolutional locality versus attention‑style globality based on data structure.
Loading comments...
loading comments...