Google DeepMind unveils its first “thinking” robotics AI (arstechnica.com)

🤖 AI Summary
Google DeepMind announced Gemini Robotics 1.5 and Gemini Robotics‑ER 1.5: a two‑model architecture that separates “thinking” from “doing” in robot control. Gemini Robotics‑ER 1.5 is a vision‑language model with embodied reasoning (ER) that performs simulated reasoning—generating multi‑step plans and decisions about interacting with a physical scene—while Gemini Robotics 1.5 is a vision‑language‑action (VLA) model that executes actions. DeepMind says the ER model achieves top marks on academic and internal benchmarks, demonstrating reliable decision-making about how to approach and decompose complex tasks; the executor model then translates those plans into motor actions. This is significant because it applies the generative AI pattern that unlocked generality in text and vision to robotics, potentially moving robots away from highly bespoke, task‑specific systems toward more flexible agents that can handle novel workspaces without extensive reprogramming. Key technical implications include modular planning vs execution (planner VLM + executor VLA), simulated reasoning as a capability for higher‑level task decomposition, and better generalization across scenarios. Practical challenges remain—safe handoff between planner and controller, real‑world robustness, and evaluation beyond benchmarks—but the approach marks a clear step toward more general, language‑grounded robotic agents.
Loading comments...
loading comments...