MemCast
MemCast / episode / insight
Marble can generate diverse scenarios for training embodied agents
  • Because Marble accepts arbitrary text or image prompts, it can synthesize a wide range of room layouts, object configurations, and lighting conditions.
  • The generated scenes are fully controllable, allowing systematic curriculum learning (e.g., gradually increasing clutter).
  • Users can also retrieve the underlying 3‑D representation for downstream physics simulation.
  • This flexibility accelerates research on navigation, manipulation, and multi‑modal perception.
Fei-Fei LiLatent Space00:50:12

Supporting quotes

Marble is a multimodal model; it takes language as input. Fei-Fei Li
We built a dense captioning system that draws boxes around objects and writes short snippets for each region. Fei-Fei Li

From this concept

Synthetic Data for Embodied AI and Robotics

Robotics suffers from a lack of high-quality, diverse training data. Marble can synthesize realistic 3-D environments, providing a middle ground between scarce real-world recordings and uncontrolled internet video. By generating controllable scenarios, researchers can train agents that transfer to real robots more effectively.

View full episode →

Similar insights