馃 AI Summary
Researchers introduced the Growing Cosine Unit (GCU), a novel oscillatory activation defined as C(z) = z路cos(z), and showed it can speed up training and reduce parameter counts in convolutional neural networks. Contrasting with the monotonic, biologically inspired activations that dominate practice (ReLU, Swish, Mish), the GCU is non-monotonic and has multiple zeros per neuron, which lets a single neuron implement multiple decision hyperplanes. The paper proves two theorems characterizing limitations of non-oscillatory activations and demonstrates empirically that swapping standard activations in convolutional layers for GCU improves accuracy on CIFAR-10, CIFAR-100 and Imagenette while enabling smaller networks to learn complex functions (e.g., XOR) without engineered features.
Key technical implications: the z路cos(z) form gives growing oscillations whose multiple roots and changing sign improve gradient flow and representational capacity, letting individual units partition input space more richly than single-threshold activations. That richer expressivity can reduce depth/width needed for a target function and accelerate convergence in practice. If validated broadly, oscillatory activations like GCU expand the activation-design space, suggesting new trade-offs in stability, initialization and regularization for training; practical adoption will require testing across architectures, tasks and training regimes.
Loading comments...
login to comment
loading comments...
no comments yet