Single-Step Diffusion Decoder for Efficient Image Tokenization (github.com)

🤖 AI Summary
Meta Research released the official implementation of SSDD (Single‑Step Diffusion Decoder), a GAN‑free diffusion decoder designed to replace KL‑VAE style decoders used for image tokenization in latent generative models. SSDD introduces a pixel‑level diffusion decoder architecture that leverages transformer blocks and flow‑matching training to model p(image|latent) more stably and at scale. Crucially, the multi‑step diffusion teacher is distilled into a single‑step decoder so reconstructions are produced in one pass rather than through iterative sampling—retaining diffusion advantages without adversarial losses. Empirically SSDD outperforms KL‑VAE tokenizers: reconstruction FID drops from 0.87 to 0.50 while throughput increases ~1.4×, and it preserves generation quality of DiTs with ~3.8× faster sampling. Training and distillation use a flow‑matching sampler (configurable ssdd.fm_sampler.steps; teacher used ~7 steps in examples) and standard ImageNet setups; the repo provides demo models, instructions for encoding/decoding, and weights on GitHub/Hugging Face. For the AI/ML community, SSDD is significant because it offers a drop‑in, faster, and higher‑quality tokenizer for latent diffusion pipelines—reducing latency and removing the need for adversarial training—making high‑quality generative training and inference more efficient and easier to integrate.
Loading comments...
loading comments...