The conversation distinguishes spatial reasoning--understanding, moving, and interacting in 3-D space--from linguistic processing. Humans are born with high-bandwidth visual perception, while language is a comparatively low-bandwidth, symbolic channel. Building AI that matches human spatial acuity requires models that go beyond token-by-token prediction.