🤖 AI Summary
Meta Research released the official implementation of SSDD (Single‑Step Diffusion Decoder), a GAN‑free diffusion decoder designed to replace KL‑VAE style decoders used for image tokenization in latent generative models. SSDD introduces a pixel‑level diffusion decoder architecture that leverages transformer blocks and flow‑matching training to model p(image|latent) more stably and at scale. Crucially, the multi‑step diffusion teacher is distilled into a single‑step decoder so reconstructions are produced in one pass rather than through iterative sampling—retaining diffusion advantages without adversarial losses.
Empirically SSDD outperforms KL‑VAE tokenizers: reconstruction FID drops from 0.87 to 0.50 while throughput increases ~1.4×, and it preserves generation quality of DiTs with ~3.8× faster sampling. Training and distillation use a flow‑matching sampler (configurable ssdd.fm_sampler.steps; teacher used ~7 steps in examples) and standard ImageNet setups; the repo provides demo models, instructions for encoding/decoding, and weights on GitHub/Hugging Face. For the AI/ML community, SSDD is significant because it offers a drop‑in, faster, and higher‑quality tokenizer for latent diffusion pipelines—reducing latency and removing the need for adversarial training—making high‑quality generative training and inference more efficient and easier to integrate.
Loading comments...
login to comment
loading comments...
no comments yet