🤖 AI Summary
Richard Sutton’s short essay “The One-Step Trap” warns that a widespread intuition in AI—that learning only one-step transition predictions and iterating them suffices to predict long-term outcomes—is fundamentally flawed. If one-step predictions were perfect, this would work, but in practice small errors compound rapidly and iterated rollouts produce large, misleading long-term forecasts. In stochastic environments or under stochastic policies, the future is not a single trajectory but an exponentially branching tree of possibilities; enumerating and weighting those branches from one-step models is computationally infeasible, so naïve model-based approaches can fail badly even when each one-step prediction is individually good.
The paper’s implication for the AI/ML community is clear: build temporal abstraction into learned models. Sutton advocates using options and General Value Functions (GVFs) to form temporally extended models that predict aggregated, longer-horizon quantities directly rather than relying on repeated one-step simulation. This perspective undercuts some common practices in POMDPs, Bayesian analyses, control theory and compression-based AI, and points toward scalable architectures like Horde and recent work on reward-respecting subtasks (Sutton et al. 1999, 2011, 2023) as practical ways to avoid the one-step trap.
Loading comments...
login to comment
loading comments...
no comments yet