Google’s SIMA 2 agent uses Gemini to reason and act in virtual worlds (techcrunch.com)

0 points 2 hours ago ago | visit original

🤖 AI Summary

DeepMind previewed SIMA 2, a next‑generation embodied agent that fuses the language and reasoning capabilities of Google’s Gemini (specifically Gemini 2.5 flash‑lite) with the embodied gameplay skills learned from hundreds of hours of 3D game data. Compared with SIMA 1, which could follow basic instructions but only completed 31% of complex tasks, SIMA 2 “step changes” performance—roughly doubling its predecessor—by using Gemini to reason about observations, explain its internal chain‑of‑thought (e.g., “ripe tomato = red house”), follow emoji commands, and act in novel photorealistic worlds generated by DeepMind’s Genie. Key technical advances include integrating a language model into the agent loop, using Gemini to both generate new training tasks and provide a reward model, and enabling self‑supervised improvement from its own trials rather than relying solely on human gameplay data. This matters for AI/ML because it moves embodied agents toward generality: SIMA 2 demonstrates scalable high‑level understanding and multi‑step reasoning in previously unseen virtual environments, a core capability for future general‑purpose robots and AGI research. DeepMind stresses SIMA 2 focuses on high‑level perception and planning (not low‑level motor control), and while there’s no timeline for physical-robot deployment or wider release, the architecture—combining LLM reasoning, self‑generated curricula, and embodied interaction—represents a practical pathway to agents that learn and generalize more like humans.

Loading comments...

loading comments...