Dreamer 4 - Training Agents Inside of Scalable World Models (danijar.com)

🤖 AI Summary
Google DeepMind’s Dreamer 4 introduces a scalable, fast world model that lets agents learn policies entirely “in imagination.” Using a new objective and architecture, the model accurately simulates complex object interactions and runs real-time interactive inference on a single GPU. By training reinforcement learning agents inside that learned model (imagination training), Dreamer 4 is the first agent to obtain diamonds in Minecraft from purely offline data—solving a task that requires sequences of over 20,000 raw-pixel mouse-and-keyboard actions—while outperforming OpenAI’s VPT offline agent and using roughly 100× less data. The paper also shows the world-model representations outperform Gemma 3 for behavioral cloning, indicating broadly useful scene understanding for decision making. Technically, Dreamer 4 couples a high-fidelity dynamics model, an imagined-policy RL loop, and a reward model capable of evaluating imagined rollouts, enabling long-horizon, counterfactual training without environment access. It generates diverse, plausible Minecraft scenarios and even models physical object interactions in real-world robotics video data, addressing prior video-model failures on interaction physics. For the AI/ML community this advances sample-efficient offline learning, safer sim-to-real workflows, and practical world-model-based agents for robotics and long-horizon planning where online interaction is costly or impossible.
Loading comments...
loading comments...