Show HN: The Annotated Discrete Diffusion Models for Text Generation (github.com)

0 points 13 hours ago ago | visit original

🤖 AI Summary

A self-contained Jupyter Notebook implementation called "The Annotated Discrete Diffusion Models" adapts Andrej Karpathy’s 7.23M‑parameter character‑level nanoGPT into a discrete diffusion model for text generation. The notebook walks through the theory and code from A. Lou et al.’s paper (Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution) and demonstrates training on Shakespeare, while letting users tweak dataset, noise schedule, and model size. It’s designed as an educational and experimental starting point that explains the math, shows a continuous‑time Markov chain formulation for token corruption, and provides runnable cells so you can reproduce and extend the work locally. Technically, the project replaces autoregressive sampling with parallel denoising: tokens are corrupted (flipped) via a discrete diffusion/noising process and the model learns a score‑entropy objective to estimate ratios of the data distribution and recover original sequences. Key components include the discrete Tweedie sampler for efficient inference, adaptation of the transformer (baby GPT) architecture to discrete score‑matching, and practical guidance on noise schedules and training. This is significant because it brings diffusion paradigms—previously transformative in images/videos—into discrete language modeling, offering an alternative route to faster, fully parallel generation and a useful platform for further research into discrete score‑matching and non‑autoregressive language models.

Loading comments...

loading comments...