World Simulation with Video Foundation Models for Physical AI (research.nvidia.com)

0 points 4 hours ago ago | visit original

🤖 AI Summary

NVIDIA announced major updates to its Cosmos World Foundation Models (WFMs) at CoRL 2025, aimed at scaling physical AI via richer, more efficient simulated data. The headline releases are Cosmos Predict 2.5—which consolidates three WFMs into a single model, supports longer video generation (up to 30 seconds) and multi-view outputs for synchronized viewpoints—and Cosmos Transfer 2.5, a 3.5× smaller model that is faster yet produces higher-quality, photorealistic outputs from spatial inputs and ground-truth simulations. Together these WFMs let developers generate diverse training data using text, image and video prompts, simplifying pipeline complexity while expanding temporal and viewpoint fidelity. This matters for AI/ML practitioners focused on robotics, autonomy and sim-to-real transfer because richer multi-view, longer-horizon videos and photorealistic scene transfer reduce the need for costly real-world data collection and improve domain randomization. Technical implications include lower compute and memory footprints (Transfer 2.5’s size/speed gains), integrated generation workflows (Predict 2.5’s consolidation), and direct support for creating synchronized multi-view datasets for perception and control tasks. All Cosmos models are openly available under the NVIDIA Open Model License (https://github.com/nvidia-cosmos), enabling immediate experimentation and integration into physical-AI training pipelines.

Loading comments...

loading comments...