Super human Stratego with RL and test time search (arxiv.org)

0 points 242 days ago ago | visit original

🤖 AI Summary

Researchers report a major breakthrough: an AI that achieves vastly superhuman play in Stratego — a notoriously hard board wargame defined by massive hidden information and long-term strategy — using self-play reinforcement learning combined with test-time search. Where previous efforts failed or required industrial-scale budgets, this work reaches and exceeds top human performance with training and compute costs on the order of only a few thousand dollars. The team emphasizes general algorithmic advances rather than game-specific engineering, training agents via self-play and resolving uncertainty at playtime through specialized imperfect-information search techniques. This result matters because it demonstrates a practical, sample- and compute-efficient path to solving complex imperfect-information problems, a category that includes card games, security and negotiation tasks, and many real-world decision problems. Key technical takeaways are the power of coupling learned policies/value functions from self-play with targeted test-time search to handle hidden state, and that those components can be designed to be both effective and inexpensive. The paper’s code, data and demos are released, suggesting the methods are reproducible and potentially adaptable to other domains — pointing to new directions in research on belief modeling, search under uncertainty, and democratized game-AI development.

Loading comments...

loading comments...