Show HN: Rogue-Bench – LLMs play the game Rogue (iwhalen.github.io)

🤖 AI Summary
Rogue-Bench has been introduced as a novel benchmark allowing large language models (LLMs) to play the classic dungeon crawler game, Rogue. This initiative is significant for the AI/ML community as it presents a creative and engaging way to evaluate the problem-solving and decision-making abilities of LLMs in a dynamic gaming environment. The benchmark utilizes a slightly modified version of Unix Rogue 5.4.2, with gameplay elements unchanged, ensuring a consistent challenge for AI systems. The technical setup involves running Rogue-Bench either locally or in a Docker container, where agents interact with the game through terminal commands. The framework captures gameplay data, including statistics and keystroke logs, enabling comprehensive analysis and the potential for replaying sessions. By combining gaming with LLM capabilities, Rogue-Bench not only tests AI reasoning in a complex scenario but also opens up new avenues for research on agent behavior and performance in interactive environments. The code is available under the GPL-3.0 license, encouraging further exploration and contributions from the AI community.
Loading comments...
loading comments...