MemCast
MemCast / episode / insight
New primitives beyond matrix multiplication are needed for distributed hardware
  • Current GPUs excel at dense matrix ops, but future hardware may favour sparse or graph‑based kernels.
  • Spatial data naturally forms graphs (e.g., splat neighbourhoods), suggesting graph neural networks or message‑passing schemes as alternatives.
  • Designing primitives that map directly onto clusters of devices could avoid the monolithic GPU bottleneck.
  • Early research at World Labs explores such kernels to better exploit the upcoming wave of specialized AI accelerators.
Fei-Fei LiLatent Space00:11:35

Supporting quotes

We need wacky ideas… hardware scaling will not be infinite, we need new primitives beyond matrix multiplication. Fei-Fei Li
Performance per watt from Hopper to Blackwell shows scaling limits; we need new architectures. Fei-Fei Li

From this concept

Future Model Architectures Beyond Transformers

Transformers treat inputs as sets of tokens, which works well for language but is sub-optimal for spatial data that lives in 3-D. The discussion highlights the need for new primitives that map better to distributed hardware and for architectures that can capture physical laws implicitly.

View full episode →

Similar insights