🤖 AI Summary
Researchers introduced SSDD (Single-Step Diffusion Decoder), a new diffusion-based image tokenizer designed to replace KL-regularized VAEs (KL-VAE) in generative image models. Tokenizers condense images into compact latent tokens used by modern generative pipelines; existing KL-VAE tokenizers rely on adversarial losses and produce artifacts or require complex training. SSDD presents a pixel diffusion decoder architecture that leverages transformer components for better scaling and training stability, and—critically—uses distillation to compress an iterative diffusion process into a single-step decoder that requires no adversarial losses.
Technically, SSDD keeps the probabilistic modeling strengths of diffusion decoders but removes the sampling-time cost by training a one-step student to mimic the multi-step teacher, yielding faster reconstruction without quality loss. Results show substantial gains: reconstruction FID improves from 0.87 to 0.50, throughput rises ~1.4Ă—, and when used with DiTs it preserves generation quality while enabling ~3.8Ă— faster sampling. That combination of GAN-free training, higher fidelity, and much lower latency makes SSDD a practical, drop-in replacement for KL-VAE tokenizers and a useful component for building faster, higher-quality generative image models.
Loading comments...
login to comment
loading comments...
no comments yet