Scale AI: Expanding Our Data Engine for Physical AI (scale.com)

🤖 AI Summary
Scale announced it is expanding its Data Engine specifically for Physical AI and robotics, positioning itself to close a critical data gap that has slowed progress in embodied AI. Drawing on nearly a decade of large-scale data work and “more than 100,000 production hours” from its San Francisco prototyping lab, Scale says it’s already supplying and enriching datasets for partners like Generalist AI, Cobot, and Physical Intelligence. The company projects a library of more than 20,000 hours of curated robotics data by year-end and offers custom collection across different robot embodiments, lab and field settings, and task regimes. Technically, Scale emphasizes three pillars—abundant collection (robot-captured and human demonstrations), enforced diversity (objects, environments, task variations), and enriched annotation—moving beyond raw trajectories to semantic labels that capture intent, task structure and failure modes. Their pipeline combines ML models, heuristics, and multi-step validation (including fine-tuning state‑of‑the‑art models to verify annotation utility) and leverages 3D capabilities in the Scale Data Engine. For the AI/ML community this matters because robotics lacks the trillions-of-token equivalent datasets that propelled language and vision models; Scale’s approach aims to produce the volume, variety and annotation quality needed to train generalist, reliable physical‑world foundation models.
Loading comments...
loading comments...