Google DeepMind releases DiffusionGemma, a model that runs local AI 4x faster (arstechnica.com)

0 points 2 hours ago ago | visit original

🤖 AI Summary

Google DeepMind has introduced DiffusionGemma, a new open AI model that stands out in the Gemma 4 family by enabling parallel text generation instead of the conventional linear autoregressive approach. This innovative model can produce an entire block of text simultaneously, significantly enhancing speed and efficiency, particularly when run on local hardware like Nvidia GPUs. Unlike standard models that generate text token by token, DiffusionGemma operates similarly to image generation models, starting with placeholder tokens and refining them through multiple passes to create finalized content. With 26 billion parameters—of which only 3.8 billion are activated during use—DiffusionGemma fits comfortably within the capabilities of high-end GPUs, achieving an impressive output of around 700 tokens per second on an RTX 5090 and over 1,000 on an Nvidia H100. This fourfold increase in speed compared to its autoregressive counterparts represents a significant leap for non-linear tasks such as in-line editing and molecular sequencing. Its design shifts the computational burden from memory bandwidth to processing power, making it particularly effective for complex problems like Sudoku, where the interdependence of tokens presents challenges for traditional AI models.

Loading comments...

loading comments...