World Labs is building AI that perceives and manipulates three-dimensional space, unlocking new creative, robotic, and therapeutic applications beyond language-only models.
The conversation distinguishes spatial reasoning--understanding, moving, and interacting in 3-D space--from linguistic processing. Humans are born with high-bandwidth visual perception, while language is a comparatively low-bandwidth, symbolic channel. Building AI that matches human spatial acuity requires models that go beyond token-by-token prediction.
Li defines spatial intelligence as the innate loop that ties 3-D perception to the impulse to act. She demonstrates how a single glance yields geometry, relationships, and predictions, and argues that current AI lacks this perception-action coupling. Training agents in simulated 3-D worlds can close the gap.
Fei-Fei Li explains what world models are, why they matter for AI, and showcases diverse use-cases ranging from virtual production to robotics simulation and therapeutic psychology.