🤖 AI Summary
Richard Sutton — reinforcement learning pioneer, 2024 Turing Award winner and author of “The Bitter Lesson” — argues that today's large language models are a dead end because they fundamentally can’t learn from experience during normal interaction. Sutton’s core claim: LLMs are optimized to mimic human-written tokens (next-token prediction), not to predict the consequences of actions in the external world. That means they lack a grounded goal signal, true surprise-driven updates, and a notion of “right” behavior tied to real outcomes. Successes on math problems or chain-of-thought reasoning don’t prove they build causal world models; they excel at computational or imitation tasks where ground truth isn’t the environment’s response. Sutton says continual, on-the-job learning — the kind animals and humans do — requires architectures that predict and learn from what actually happens, using reward-like feedback, not just more scale or RLHF grafted onto a static LLM.
The implication for the AI/ML community is clear: prioritizing scalable architectures for experiential, continual learning (model-based prediction of consequences, online adaptation, and reward-driven updates) may be the necessary next step, not ever-larger text-only models. Technically, this points back to core RL primitives (TD learning, policy gradients, world models) and to research on integrating online interaction, surprise-based updates, and persistent learning agents. If realized, such agents could make current LLM-centric training pipelines and special pretraining phases obsolete, shifting investment from pure scale to systems that learn from real-world experience.
Loading comments...
login to comment
loading comments...
no comments yet