Transformers treat inputs as sets of tokens, which works well for language but is sub-optimal for spatial data that lives in 3-D. The discussion highlights the need for new primitives that map better to distributed hardware and for architectures that can capture physical laws implicitly.
Steinberger argues that running AI agents locally on user devices unlocks capabilities cloud-based AI can't match, enabling direct control over personal devices and data.