MemCast
MemCast / episode / insight
Diffusion models turned ‘impossible’ text‑to‑image generation into reality, powering today’s generative AI boom
  • Early attempts at generating images from text were dismissed as impossible, as illustrated by Andrej Karpathy’s skeptical response.
  • The breakthrough came with diffusion models, which iteratively denoise random noise conditioned on a textual prompt.
  • These models now create high‑fidelity images and videos from natural language, enabling tools like OpenAI’s Sora.
  • Li credits this paradigm shift for the explosion of generative content across art, design, and entertainment.
  • The story demonstrates how a new algorithmic class can overturn long‑standing assumptions about AI capability.
Fei‑Fei LiTED00:04:02

Supporting quotes

安德烈说:‘哈哈,不可能。’ Fei‑Fei Li
quoting Andrej Karpathy
这要归功于一系列扩散模型,它们驱动了如今的生成式 AI 算法,将人为输入的句子转换为闻所未闻事物的照片和视频。 Fei‑Fei Li
很多人已经看到了 OpenAI 的 Sora 最近展示出的惊人成果。 Fei‑Fei Li

From this concept

From Labels to Worlds: The Rise of Generative and Spatial AI

Li explains how diffusion models turned text-to-image generation from "impossible" to commonplace, and how generative video models like Walt push the frontier further. She then moves to spatial AI, where algorithms convert 2-D photos into 3-D scenes, opening the door to immersive, interactive worlds.

View full episode →

Similar insights