Adding dynamics and force reasoning to generative 3-D models is essential for applications like architecture and robotics. The team explores two pathways: attaching physical properties directly to splats and distilling classical physics engine simulations into neural weights. Accurate physics remains a hard problem, especially when models must generalise to unseen forces.
Robotics suffers from a lack of high-quality, diverse training data. Marble can synthesize realistic 3-D environments, providing a middle ground between scarce real-world recordings and uncontrolled internet video. By generating controllable scenarios, researchers can train agents that transfer to real robots more effectively.
Li defines spatial intelligence as the innate loop that ties 3-D perception to the impulse to act. She demonstrates how a single glance yields geometry, relationships, and predictions, and argues that current AI lacks this perception-action coupling. Training agents in simulated 3-D worlds can close the gap.