Continuous Autoregressive Language Models (shaochenze.github.io)

🤖 AI Summary
Researchers propose Continuous Autoregressive Language Models (CALM), a new paradigm that replaces discrete next-token prediction with continuous next-vector prediction to break the “discrete token” bandwidth bottleneck. Instead of predicting one of ~32K tokens (~15 bits) per step, CALM compresses K-token chunks into dense latent vectors via a high-fidelity autoencoder and autoregressively models the sequence of vectors. This can dramatically reduce the number of autoregressive steps and improve throughput without changing core Transformer compute. The shift is motivated by information-capacity limits of discrete vocabularies and demonstrated practically: compressing K=4 tokens into a latent vector (practically l=128) yields >99.9% reconstruction accuracy and robustness to Gaussian noise (σ≈0.3) after using a variational/regularized encoder. Technically, CALM removes the softmax over a finite vocabulary and therefore adopts a likelihood-free generative head: an MLP that samples a next-vector conditioned on the Transformer hidden state plus noise. To enable single-step, efficient generation (unlike slow diffusion), they train the head using the Energy Score, estimated by Monte Carlo samples, which is a proper scoring rule that encourages matching the data distribution. Generated vectors are decoded back to tokens and re-embedded for downstream prediction. Evaluation uses likelihood-free metrics: a Monte Carlo Brier-based BrierLM (Brier-n for n=1..4) and temperature control via rejection-sampling/Bernoulli-factory style schemes (with batch combinatoric optimizations). CALM opens a new scaling axis—semantic bandwidth per step—with practical solutions for modeling, sampling, and evaluation in continuous autoregressive language modeling.
Loading comments...
loading comments...