SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization (arxiv.org)

0 points 8 hours ago ago | visit original

🤖 AI Summary

Researchers introduced SSDD (Single-Step Diffusion Decoder), a new diffusion-based image tokenizer designed to replace KL-regularized VAEs (KL-VAE) in generative image models. Tokenizers condense images into compact latent tokens used by modern generative pipelines; existing KL-VAE tokenizers rely on adversarial losses and produce artifacts or require complex training. SSDD presents a pixel diffusion decoder architecture that leverages transformer components for better scaling and training stability, and—critically—uses distillation to compress an iterative diffusion process into a single-step decoder that requires no adversarial losses. Technically, SSDD keeps the probabilistic modeling strengths of diffusion decoders but removes the sampling-time cost by training a one-step student to mimic the multi-step teacher, yielding faster reconstruction without quality loss. Results show substantial gains: reconstruction FID improves from 0.87 to 0.50, throughput rises ~1.4×, and when used with DiTs it preserves generation quality while enabling ~3.8× faster sampling. That combination of GAN-free training, higher fidelity, and much lower latency makes SSDD a practical, drop-in replacement for KL-VAE tokenizers and a useful component for building faster, higher-quality generative image models.

Loading comments...

loading comments...