đ¤ AI Summary
Researchers propose Continuous Autoregressive Language Models (CALM), a new paradigm that replaces discrete next-token prediction with continuous next-vector prediction to break the âdiscrete tokenâ bandwidth bottleneck. Instead of predicting one of ~32K tokens (~15 bits) per step, CALM compresses K-token chunks into dense latent vectors via a high-fidelity autoencoder and autoregressively models the sequence of vectors. This can dramatically reduce the number of autoregressive steps and improve throughput without changing core Transformer compute. The shift is motivated by information-capacity limits of discrete vocabularies and demonstrated practically: compressing K=4 tokens into a latent vector (practically l=128) yields >99.9% reconstruction accuracy and robustness to Gaussian noise (Ďâ0.3) after using a variational/regularized encoder.
Technically, CALM removes the softmax over a finite vocabulary and therefore adopts a likelihood-free generative head: an MLP that samples a next-vector conditioned on the Transformer hidden state plus noise. To enable single-step, efficient generation (unlike slow diffusion), they train the head using the Energy Score, estimated by Monte Carlo samples, which is a proper scoring rule that encourages matching the data distribution. Generated vectors are decoded back to tokens and re-embedded for downstream prediction. Evaluation uses likelihood-free metrics: a Monte Carlo Brier-based BrierLM (Brier-n for n=1..4) and temperature control via rejection-sampling/Bernoulli-factory style schemes (with batch combinatoric optimizations). CALM opens a new scaling axisâsemantic bandwidth per stepâwith practical solutions for modeling, sampling, and evaluation in continuous autoregressive language modeling.
Loading comments...
login to comment
loading comments...
no comments yet