🤖 AI Summary
Vector-Quantized VAEs (VQ-VAEs) compress continuous inputs into discrete codebook entries but have long struggled with non-differentiable quantization: standard training uses a straight-through approximation that routes gradients around the quantizer and discards information about how encoder outputs relate to chosen code vectors. This paper introduces a simple "rotation trick" that smoothly maps each encoder output onto its selected codebook vector via a linear rotation-and-rescaling transform. The transform is applied forward but treated as a constant during backprop, so gradients flowing back carry information about the relative angle and magnitude between the encoder output and its codebook vector rather than being entirely bypassed.
Empirically, this restructuring of the quantization layer improves reconstruction quality, increases codebook utilization, and reduces quantization error across 11 VQ‑VAE training paradigms, suggesting better discrete representation learning without changing the discrete decoding target. Technically, the method preserves the discrete selection behavior while creating a differentiable path that encodes geometric relationships into gradients—an attractive middle ground between hard quantization and continuous relaxations. The approach is simple to implement and the authors provide code, making it readily testable in compression, generative modeling, and other applications that rely on learned discrete latents.
Loading comments...
login to comment
loading comments...
no comments yet