🤖 AI Summary
A researcher experimented with generating 64×64 cat images by doing iterative denoising in pixel space using Kernel Prediction Networks (KPNs) as the denoiser instead of the usual latent-space diffusion/noise-prediction approach. The pipeline lerps images to Gaussian noise during training and then trains a model (an 8×8 patch transformer backbone with upscaling convolutions driving a KPN denoising network) to predict a low-rank target image using L2 and LPIPS losses. The KPN stack is implemented with ideas from partitioning pyramids and Procedural Kernel Networks (low-rank Gaussian parametrization, 5×5 kernels, 2×2 pooling, low-rank 5×5 upsampling with sigmoid lerps and skips). Because bilateral-style filters are convex combinations and cannot invent new content, the author augments the KPN with (a) non-normalized, sign-capable kernel weights (tanh activation) to break convexity, and (b) a separate low-capacity U-Net that predicts per-pixel color “drift” offsets added after filtering to restore low-frequency/color detail.
Significance: this is a proof-of-concept showing that a quantization-friendly, GPU-efficient KPN + low-rank bottleneck can do generative remodeling in pixel space—potentially reducing model capacity and making edge deployment easier. The approach trades generative expressivity (convex filter limits) for strong regularization and efficiency; allowing negative weights and a drift network mitigates that but results remain modest after ~5k epochs. The work highlights a promising avenue for compact, low-rank generative models and practical techniques (kernel parametrization, quantization-friendly design) worth exploring further for on-device image synthesis.
Loading comments...
login to comment
loading comments...
no comments yet