Sudoku-Bench: Evaluating Creative Reasoning With Sudoku Variants (arxiv.org)

🤖 AI Summary
A new benchmark called Sudoku-Bench has been introduced to assess the creative reasoning capabilities of large language models (LLMs) in solving unconventional Sudoku puzzles. Traditional reasoning benchmarks often push for pattern recognition, but Sudoku-Bench aims to encourage multi-step logical reasoning by presenting unique puzzles that require novel problem-solving strategies, referred to as "break-ins." The benchmark includes a carefully curated set of challenging Sudoku variants and a standardized text-based representation, making it easier to extend research across thousands of available puzzles. Significantly, baseline tests revealed that even state-of-the-art LLMs were able to solve less than 15% of these challenging puzzles unaided, underscoring a critical need for advancements in long-horizon strategic reasoning within AI. Sudoku-Bench not only provides a novel testing ground for algorithmic creativity but also opens new avenues for improving LLMs' reasoning capabilities, marking a step forward in developing AI that can think creatively and solve complex problems beyond mere memorization.
Loading comments...
loading comments...