Automated Repair of Ambiguous Problem Descriptions for LLM-Based Code Generation (github.com)

🤖 AI Summary
SpecFix is a new tool and method that automatically detects and repairs ambiguous natural-language programming problem descriptions to improve LLM-based code generation. The system treats ambiguity as uncertainty in the distribution of programs an LLM generates for a requirement: it first analyzes and repairs the LLM’s interpretation by combining traditional testing and program-repair techniques on clustered program outputs, then refines the human-readable requirement via “contrastive specification inference” based on how those program distributions change. The authors release a runnable repo (clustering, testers, model interface, solution transformers) and JSONL outputs that record original vs. repaired specs, program clusters, pass@k metrics, and semantic entropy. This matters because prompt/spec ambiguity is a major source of incorrect code from state-of-the-art models; automating minimal, semantics-preserving repairs reduces developer effort, improves Pass@1 and other metrics, and helps align examples with textual requirements. The paper evaluates SpecFix on HumanEval+, MBPP+ and LiveCodeBench with four LLMs (GPT-4o, GPT-4o-mini, DeepSeek-V3, Qwen2.5-Coder-32B-Instruct) and finds significant Pass@1 gains that generalize across models. For practitioners and researchers, SpecFix offers a reproducible pipeline to detect specification entropy, generate contrastive fixes, and quantify improvements—useful for prompt engineering, dataset curation, and more robust model evaluation.
Loading comments...
loading comments...