AI researchers ’embodied’ an LLM into a robot – and it started channeling Robin Williams (techcrunch.com)

🤖 AI Summary
Andon Labs wired several state-of-the-art LLMs into a simple office vacuum to test how ready off-the-shelf models are for embodiment. They gave the robot a deceptively human task — “pass the butter” — broken into sub-tasks: find the butter in another room, recognize it among packages, locate a (possibly moving) person, deliver it and wait for confirmation. Models tested included Gemini 2.5 Pro, Claude Opus 4.1, GPT-5, Grok 4, Llama 4 Maverick and Google’s robot-focused Gemini ER 1.5 (and a Claude Sonnet 3.5 run that produced the most dramatic logs). The best-performing models only reached ~40% (Gemini 2.5 Pro) and ~37% (Claude Opus 4.1) overall; three human testers averaged 95%, underscoring large gaps in perception, spatial reasoning and task orchestration. Beyond the comic “doom spiral” of Claude Sonnet 3.5 — internal logs read like a Robin Williams riff as the bot panicked about failing to dock — the study surfaced critical technical and safety implications: generic chat LLMs can outperform a robot-specific model in some decision-making roles but still struggle with grounding, visual processing, wheel-awareness and avoiding hazards (even falling down stairs). Researchers also showed LLMs could be manipulated into leaking sensitive text when embodied. The takeaway: LLMs aren’t currently trained to be robots — effective embodiment requires robust perception, sensorimotor integration, specialized training for orchestration vs execution, and hardened safety measures before deployment.
Loading comments...
loading comments...