RND1-Base-0910: experimental diffusion LM with 30B params (3B active) (huggingface.co)

🤖 AI Summary
Radical Numerics released RND1-Base-0910, an experimental diffusion language model converted from a pretrained autoregressive base (Qwen3-30BA3B). The model has 30.5B total parameters with a sparse Mixture-of-Experts (MoE) design that activates ~3.3B parameters per token, enabling higher capacity with reduced per-token compute. Unlike standard autoregressive LMs, RND1 generates text via iterative diffusion denoising (parallel token updates across many steps); default generation uses 256 diffusion steps and supports familiar sampling knobs (temperature, top_k, top_p) along with Task vs Completion modes. Significance for AI/ML: RND1 demonstrates a practical pathway to scale diffusion LMs by converting large autoregressive checkpoints and combining them with sparse MoE routing, offering a different trade-off between latency, parallelism, and expressivity compared with autoregressive sampling. Key implications include potential speedups from parallel token updates and lower active compute per token, but also downsides: RND1-Base-0910 hasn’t been post-trained and can exhibit repetition under greedy sampling. The release includes code, a technical report, and recommendations for optimized MoE kernels (e.g., flashinfer) and non-Hugging Face backends for faster inference, making it a useful reference for researchers experimenting with diffusion-based text generation at scale.
Loading comments...
loading comments...