Li explains how diffusion models turned text-to-image generation from "impossible" to commonplace, and how generative video models like Walt push the frontier further. She then moves to spatial AI, where algorithms convert 2-D photos into 3-D scenes, opening the door to immersive, interactive worlds.