🤖 AI Summary
SpecFix is a new tool and method that automatically detects and repairs ambiguous natural-language programming problem descriptions to improve LLM-based code generation. The system treats ambiguity as uncertainty in the distribution of programs an LLM generates for a requirement: it first analyzes and repairs the LLM’s interpretation by combining traditional testing and program-repair techniques on clustered program outputs, then refines the human-readable requirement via “contrastive specification inference” based on how those program distributions change. The authors release a runnable repo (clustering, testers, model interface, solution transformers) and JSONL outputs that record original vs. repaired specs, program clusters, pass@k metrics, and semantic entropy.
This matters because prompt/spec ambiguity is a major source of incorrect code from state-of-the-art models; automating minimal, semantics-preserving repairs reduces developer effort, improves Pass@1 and other metrics, and helps align examples with textual requirements. The paper evaluates SpecFix on HumanEval+, MBPP+ and LiveCodeBench with four LLMs (GPT-4o, GPT-4o-mini, DeepSeek-V3, Qwen2.5-Coder-32B-Instruct) and finds significant Pass@1 gains that generalize across models. For practitioners and researchers, SpecFix offers a reproducible pipeline to detect specification entropy, generate contrastive fixes, and quantify improvements—useful for prompt engineering, dataset curation, and more robust model evaluation.
Loading comments...
login to comment
loading comments...
no comments yet