TD-Gammon (en.wikipedia.org)

0 points 3 days ago ago | visit original

🤖 AI Summary

TD-Gammon, developed by Gerald Tesauro at IBM in the early 1990s, was a groundbreaking backgammon AI that pioneered the use of temporal-difference (TD) learning combined with neural networks. Unlike earlier programs relying on expert-coded heuristics, TD-Gammon learned its evaluation function primarily through millions of games of self-play, refining its strategies without human supervision. By 1993, after training on 1.5 million games, it reached near top-human-level performance, demonstrating competitive strength against grandmasters and introducing novel strategies previously unconsidered or undervalued by human players. Technically, TD-Gammon’s architecture involved a three-layer neural network with nearly 200 input neurons encoding board positions and expert-inspired features, feeding into hidden units, and outputting estimated winning probabilities across four possible game outcomes. Its learning algorithm adjusted network weights after each turn by minimizing differences between predicted and observed outcomes, employing a TD-Lambda approach. The program also utilized increasingly deep search depths over versions, progressing from one to three-ply lookahead, enhancing move selection. Notably, TD-Gammon's success stemmed from balancing intuitive pattern recognition and limited lookahead, excelling in positional play while occasionally faltering in complex endgames and doubling cube decisions. TD-Gammon’s significance extends beyond backgammon, as it demonstrated the power of reinforcement learning with self-play in mastering complex, stochastic domains without human-crafted rules. Its strategies influenced top human players and inspired subsequent AI research and commercial programs, laying conceptual groundwork for modern reinforcement learning breakthroughs in games like Go and chess. The project underscored neural networks’ capacity to discover innovative strategies autonomously, marking a milestone in AI-driven game theory and learning.

Loading comments...

loading comments...