GEN-0 / Embodied Foundation Models That Scale with Physical Interaction (generalistai.com)

0 points 17 hours ago ago | visit original

🤖 AI Summary

Google DeepMind (GEN-0) announces a new class of embodied foundation models for robotics trained directly on high-fidelity physical interaction, scaled to 10B+ parameters and trained on an in‑house dataset of >270k hours (growing ~10k hours/week). The release highlights three headline claims: (1) “Harmonic Reasoning,” a training/representation approach that fuses continuous-time sensing and acting tokens so models can “think and act” concurrently without System1/System2 splits or expensive inference-time guidance; (2) cross‑embodiment generalization across 6–16+ DoF platforms; and (3) robust scaling laws showing predictable, power‑law gains from more pretraining data and compute. Practically, GEN-0 models above ~7B parameters avoid a newly observed “ossification” phase (seen in smaller models like 1B), adapt to new tasks with much less post‑training, and show fast multi‑task transfer after modest supervised fine‑tuning. Technically, the team quantifies downstream performance with validation MSE and reverse KL (estimated via a Monte‑Carlo mixture‑of‑Gaussians from policy samples), and fits a power‑law L(D) = (Dc/D)^α to predict how pretraining scale maps to next‑action error and how much finetuning data is “bought” by larger pretraining. They report that models with low MSE and low reverse KL fine‑tune more easily, while high‑error/low‑KL models yield multimodal behavior useful for RL. The work implies robotics requires much larger models/data thresholds than language to unlock physical commonsense, guiding data-collection and compute investment strategies for scalable embodied AI.

Loading comments...

loading comments...