🤖 AI Summary
This week “world models” went mainstream in three very different flavors: Fei-Fei Li’s World Labs shipped Marble, a browser/VR pipeline that turns text, images, short video or blocky layouts into editable 3D scenes; reports say Yann LeCun is leaving Meta to found a startup around his latent predictive “world model” ideas; and DeepMind released Genie 3, an online interactive video engine it bills as a world model. The significance is cultural and technical: the AI community is shifting conversations from language-only LLMs toward spatial, embodied intelligence — but the term “world model” now covers asset-generation front ends, simulators, and internal predictive brains, which are very different research products and use cases.
Technically, Marble is a full-stack content pipeline that hallucinates 3D scenes and exports Gaussian splats or standard meshes (OBJ/FBX) to engines like Three.js or Unity, with an in-browser editor (Chisel) and a splat-optimized renderer (Spark). Gaussian splatting trades photogrammetry’s geometric fidelity for fast, photorealistic rendering of foliage, hair and soft lighting — great for human-facing VR/AR and game-assets but not a robot’s internal planner. LeCun’s brief sketches build on JEPA-style architectures: ingest sensory streams, learn compact latent states, predict state transitions and drive planning — a cognitive backbone for agents. DeepMind’s Genie sits between them as a model-generated simulator: real-time, persistent-frame environments for training agents. The practical takeaway: ask whether a “world model” outputs assets, frames, or latents — and whether it’s aimed at humans, agents, or internal cognition — because that choice determines its research implications and product impact.
Loading comments...
login to comment
loading comments...
no comments yet