Empirical Study of Pull Requests on GitHub (arxiv.org)

🤖 AI Summary
Researchers performed an empirical study of "agentic" coding by analyzing 567 GitHub pull requests (PRs) created with Claude Code across 157 open-source projects to measure real-world usefulness and acceptance. They found developers commonly task agents with refactoring, documentation, and tests, and report that 83.8% of agent-assisted PRs were eventually merged. Of merged PRs, 54.9% were integrated without further modification, while 45.1% required human edits—most often to fix bugs, improve docs, or conform to project-specific standards. For the AI/ML community, this provides practical evidence that autonomous LLM-driven agents can produce production-acceptable code changes at scale, speeding routine maintenance and low-risk tasks. At the same time, the nontrivial fraction needing human revision underscores the continued need for human-in-the-loop review, better agent handling of correctness and project conventions, and improved evaluation metrics for behavioral safety in code generation. Key implications include prioritizing model improvements around bug prevention and style/standards alignment, integrating more robust testing and specification signals into agent workflows, and designing review tooling that optimally combines agent speed with human oversight.
Loading comments...
loading comments...