MemCast
MemCast / episode / insight
GPU performance per watt is hitting limits, prompting new hardware primitives
  • Even as transistor counts grow, the performance‑per‑watt curve has plateaued between Hopper and Blackwell generations.
  • Relying solely on matrix‑multiplication‑centric GPUs will soon become a bottleneck for distributed 3‑D workloads.
  • The guests advocate exploring primitives beyond dense matrix ops, such as graph‑based or sparsity‑aware kernels that map better to spatial data.
  • Early research into these alternatives could unlock the next wave of efficient world‑model training.
Fei-Fei LiLatent Space00:11:35

Supporting quotes

We need wacky ideas… hardware scaling will not be infinite, we need new primitives beyond matrix multiplication. Fei-Fei Li
If you think about AlexNet, the core pieces of it were obviously ImageNet, it was the move to GPUs and neural networks. Fei-Fei Li

From this concept

Compute Scaling as the Engine of Progress

The guests argue that every major leap in AI has been driven by orders-of-magnitude increases in compute. The exponential growth in GPU performance and the ability to train on thousands of devices unlocks the data-hungry spatial models that were impossible a decade ago.

View full episode →

Similar insights