Huxley-Gödel Machine (arxiv.org)

🤖 AI Summary
Researchers introduce the Huxley-Gödel Machine (HGM), a new approach for developing self-improving coding agents that addresses what they call the Metaproductivity-Performance Mismatch: high benchmark scores don’t necessarily indicate an agent’s capacity to produce even better future descendants. Drawing on Huxley’s clade idea, they define a clade-based metric (CMP) that aggregates benchmark performance across an agent’s descendants as a proxy for its metaproductivity. They prove that access to the true CMP can, under certain assumptions, simulate the behavior of a Gödel Machine (an idealized optimal self-improver), and build HGM to estimate CMP and use it to guide search through the tree of self-modifications. Technically, HGM replaces naïve selection-by-current-performance with CMP-guided expansion and pruning of the modification tree, improving both sample and wall-clock efficiency. Empirically it outperforms prior self-improvement methods on SWE-bench Verified and Polyglot, transfers well to other coding datasets and LLMs, and—critically—an HGM-optimized agent trained with GPT-5-mini and tested with GPT-5 achieves human-level performance on SWE-bench Lite, matching top human-engineered agents. The work provides a principled, theoretically motivated metric for metaproductivity and a practical algorithmic path toward more efficient, scalable self-improving coding agents; code and data are publicly available.
Loading comments...
loading comments...