🤖 AI Summary
A researcher combined DreamCoder’s library-building idea with LLM-driven program synthesis to create an efficient, reusable solver for the ARC-AGI benchmark family. Using an empty initial library of Python programs, Chain-of-Thought prompts, and a simple heuristic “recognition” scoring (primary = exact training-example matches; secondary = average cell-level accuracy), the system iteratively queries an LLM for candidate programs, evaluates them, and adds the best program per task to the shared library. Running for multiple rounds over tasks, this DreamCoder-inspired evolutionary approach reached 77.1% on ARC-AGI-1 and 26.0% on ARC-AGI-2 (semi-private eval), outperforming prior bespoke systems and frontier base models on accuracy-cost tradeoffs and breaking the prior Pareto frontier for performance vs compute cost. A contemporaneous variant by Jeremy Berman reports slightly higher accuracy but at substantially higher cost; the author’s method remains state-of-the-art in efficiency.
Technically, the system merges scalable LLM program generation with cross-task concept reuse (library growth) rather than treating each ARC task independently. Key design choices: Python (Turing-complete) programs instead of a handcrafted DSL, inclusion of the current best library program in prompts to guide search, and a simple wake-sleep loop where the LLM composes candidates and the library accumulates abstractions. The result demonstrates that LLMs plus lightweight library learning can materially improve compositional and contextual reasoning on ARC-style tasks, pointing toward more cost-effective neurosymbolic program synthesis and future gains from learned recognition models and same-budget comparisons.
Loading comments...
login to comment
loading comments...
no comments yet