🤖 AI Summary
Researchers introduce "Dr. Boot," a bootstrapping algorithm that teaches program-synthesis language models not just to generate code, but to iteratively repair it—more closely mirroring how humans write code with a compiler in the loop. The motivation is twofold: standard synthesis datasets (MBPP, APPS) are small and noisy relative to models’ data needs, and current models typically emit a single final solution rather than iterating. Bootstrapping trains models to propose fixes and use feedback, improving robustness and sample efficiency without simply scaling model size.
Empirically, bootstrapping consistently beats standard fine-tuning and yields performance comparable to fine-tuned models that are 68% larger. Interestingly, bootstrapping that includes repairing also boosts performance even when the model is later used in a non-repairing mode, though using repair loops at inference didn’t always outperform naïvely sampling the same number of candidate solutions. The paper also flags problematic example test cases in the APPS training split—an important dataset-quality issue because many repair and reinforcement-learning approaches depend on those tests for supervision and reward. Overall, Dr. Boot suggests a practical path to more data-efficient, human-like synthesis and highlights the need to audit benchmark datasets used for repair-based training.
Loading comments...
login to comment
loading comments...
no comments yet