“A million tokens is a lot. That can be days of human learning. If you think about the model reading a million words, how long would it take me to read a million? Days or weeks at least.” — Dario Amodei
“There's nothing preventing longer contexts from working. You just have to train at longer contexts and then learn to serve them at inference.” — Dario Amodei