New primitives beyond matrix multiplication are needed for d...

#distributed-compute2 #model-architectures2 #transformers1

New primitives beyond matrix multiplication are needed for distributed hardware

Current GPUs excel at dense matrix ops, but future hardware may favour sparse or graph‑based kernels.
Spatial data naturally forms graphs (e.g., splat neighbourhoods), suggesting graph neural networks or message‑passing schemes as alternatives.
Designing primitives that map directly onto clusters of devices could avoid the monolithic GPU bottleneck.
Early research at World Labs explores such kernels to better exploit the upcoming wave of specialized AI accelerators.

Fei-Fei LiLatent Space00:11:35

Supporting quotes

Quote

“We need wacky ideas… hardware scaling will not be infinite, we need new primitives beyond matrix multiplication.” — Fei-Fei Li

Quote

00:13:02

“Performance per watt from Hopper to Blackwell shows scaling limits; we need new architectures.” — Fei-Fei Li

From this concept

Future Model Architectures Beyond Transformers

Transformers treat inputs as sets of tokens, which works well for language but is sub-optimal for spatial data that lives in 3-D. The discussion highlights the need for new primitives that map better to distributed hardware and for architectures that can capture physical laws implicitly.

View full episode →

Similar insights

“Transformers are set models of token sets, not sequences, limiting spatial reasoning”

Fei-Fei LiLatent Space

“Emergent capabilities require architectural innovation, not just scale”

Fei-Fei LiLatent Space

“Start small to avoid information overload.”

Ali KhanTitans Of Tomorrow