🤖 AI Summary
Google DeepMind announced SIMA 2, an upgraded “scalable instructable multiworld agent” built on top of its Gemini large language model and tested in commercial games (including Goat Simulator 3 and No Man’s Sky) plus custom virtual worlds. Unlike goal-specific game AIs, SIMA 2 is trained from human gameplay video to map pixel inputs to keyboard/mouse actions and to follow natural-language, voice, or drawn instructions. Gemini boosts the agent’s ability to ask clarifying questions, generate task variations, and provide corrective tips; DeepMind also used its world model Genie 3 to synthesize new environments where SIMA 2 can practice and improve via trial-and-error loops.
Technically, the work combines an LLM for instruction reasoning, a video-to-action controller, and procedurally generated training worlds to push toward general-purpose, instruction-following agents — a potential stepping stone to robots that navigate, use tools, and collaborate with humans. But SIMA 2 remains experimental: it struggles with long, multi-step tasks, has limited memory, and is still less dexterous with mouse/keyboard controls. Researchers caution that transfer from polished game visuals and shared control schemes to messy real-world perception and embodied control is nontrivial. Still, using Gemini+Genie to generate curricula and feedback points to a scalable training paradigm for more adaptable agents.
Loading comments...
login to comment
loading comments...
no comments yet