Show HN: Aion-Torch – Adaptive residual scaling for deep Transformers (github.com)

0 points 3 hours ago ago | visit original

🤖 AI Summary

AION (Adaptive Input/Output Normalization) is a new PyTorch alpha library that replaces fixed residual scaling with an adaptive layer that computes x + α·y where α is dynamically adjusted from input and output “energies.” The goal is to keep residual branch magnitudes balanced so very deep networks (hundreds of layers) train stably without manual tuning. In a 600-layer Pre‑LayerNorm transformer test (batch 8, seq 128, dim 512) both models completed 150 steps, but the AION model reached a much lower final loss (0.0011 ± 0.0003 vs 0.0075 ± 0.0015) and had far smaller, more stable gradient norms (0.0135 ± 0.0033 vs 0.0665 ± 0.0116), yielding roughly 7× lower final loss and faster convergence. Practically, AION is provided as a simple AionResidual layer (alpha0, beta parameters) and works with PyTorch 2.0+. The tradeoff is compute: an unoptimized AION adds ~36% per-step overhead (9.79 ms → 13.36 ms), but the authors propose inexpensive mitigations (k_update > 1 to update α less frequently—e.g., k_update=4 cuts AION compute ≈75%), operation fusion, reuse of statistics, or lower-precision tracking, with a target overhead <~5% in production. Alpha release on PyPI (pip install aion-torch); promising for stabilizing extremely deep transformers on top of modern Pre‑LayerNorm practice, but expect API/implementation changes.

Loading comments...

loading comments...