
Fei‑Fei Li and Justin Johnson explain how exploding compute, open‑science datasets, and new 3‑D data structures are powering the next generation of world models and spatial intelligence.
The guests argue that every major leap in AI has been driven by orders-of-magnitude increases in compute. The exponential growth in GPU performance and the ability to train on thousands of devices unlocks the data-hungry spatial models that were impossible a decade ago.
World Labs balances an open-science ethos--publishing datasets and benchmarks--with the commercial realities of a startup. While academia struggles for resources, industry can move faster but often keeps breakthroughs behind closed doors. The tension shapes how world-model research is funded and shared.
The conversation distinguishes spatial reasoning--understanding, moving, and interacting in 3-D space--from linguistic processing. Humans are born with high-bandwidth visual perception, while language is a comparatively low-bandwidth, symbolic channel. Building AI that matches human spatial acuity requires models that go beyond token-by-token prediction.
Marble is World Labs' flagship system that turns text, images, or multiple images into editable 3-D scenes. It is deliberately built as a usable product today while also serving as a research stepping-stone toward full-scale spatial intelligence. Real-time demos prove the feasibility of streaming 3-D generation over the internet.
World Labs experiments with several atomic representations for 3-D content: Gaussian splats, frame-by-frame video tokens, and tokenised 3-D chunks. The choice of primitive determines rendering speed, editability, and scalability. Gaussian splats currently offer the best real-time performance on mobile devices.
Adding dynamics and force reasoning to generative 3-D models is essential for applications like architecture and robotics. The team explores two pathways: attaching physical properties directly to splats and distilling classical physics engine simulations into neural weights. Accurate physics remains a hard problem, especially when models must generalise to unseen forces.
Robotics suffers from a lack of high-quality, diverse training data. Marble can synthesize realistic 3-D environments, providing a middle ground between scarce real-world recordings and uncontrolled internet video. By generating controllable scenarios, researchers can train agents that transfer to real robots more effectively.
Transformers treat inputs as sets of tokens, which works well for language but is sub-optimal for spatial data that lives in 3-D. The discussion highlights the need for new primitives that map better to distributed hardware and for architectures that can capture physical laws implicitly.
World Labs actively recruits researchers, engineers, and product people, emphasizing interdisciplinary expertise in graphics, physics, and AI. The founders view open challenges and a clear product vision as magnets for top talent, while also highlighting the need for public-sector resources to sustain academic breakthroughs.
{ "memcast_version": "0.1", "episode": { "id": "60iW8FZ7MJU", "title": "After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs", "podcast": "Latent Space", "guest": "Fei-Fei Li and Justin Johnson", "host": "Allesio", "source_url": "https://www.youtube.com/watch?v=60iW8FZ7MJU", "duration_minutes": 61 }, "concepts": [ { "id": "compute-scaling-as-the-engine-of-progress", "title": "Compute Scaling as the Engine of Progress", "tags": [ "capital-scaling" ] }, { "id": "open-science-vs-commercial-incentives", "title": "Open Science vs. Commercial Incentives", "tags": [ "funded-trading" ] }, { "id": "spatial-intelligence-vs-language-intelligence", "title": "Spatial Intelligence vs. Language Intelligence", "tags": [ "spatial-intelligence", "cognitive-biases" ] }, { "id": "marble-a-generative-3-d-world-model-and-product", "title": "Marble: A Generative 3-D World Model and Product", "tags": [ "3d-ai", "marketing‑strategy" ] }, { "id": "data-structures-for-3-d-worlds", "title": "Data Structures for 3-D Worlds", "tags": [] }, { "id": "integrating-physics-into-world-models", "title": "Integrating Physics into World Models", "tags": [ "simulation", "dynamics" ] }, { "id": "synthetic-data-for-embodied-ai-and-robotics", "title": "Synthetic Data for Embodied AI and Robotics", "tags": [ "automation", "simulation" ] }, { "id": "future-model-architectures-beyond-transformers", "title": "Future Model Architectures Beyond Transformers", "tags": [ "distributed-compute", "model-architectures" ] }, { "id": "talent-and-ecosystem-for-spatial-ai", "title": "Talent and Ecosystem for Spatial AI", "tags": [ "ecosystem", "talent-identification" ] } ] }