🤖 AI Summary
A new study from ETH Zurich's SRI Lab rigorously evaluates the impact of repository-level context files like CLAUDE.md and AGENTS.md on the performance of coding agents. Contrary to widespread belief, the findings reveal that LLM-generated context files can actually impair task success rates and increase inference costs by over 20%. In contrast, human-written context files improve performance by an average of 4%, underscoring their value when they provide unique insights that aren't already documented in the repository.
The research introduces a new benchmark, AGENTbench, focused on less-popular Python repositories with developer-written context files, and evaluates several coding agents—including Claude Code, Codex, and Qwen Code—against these benchmarks. It highlights that while agents follow instructions from context files diligently, simply increasing the volume of guidance does not ensure better outcomes; relevant, non-redundant information is crucial for successful execution. This insight is pivotal for developers of coding agents, emphasizing the need to create precise, valuable context files that avoid restating existing documentation to enhance efficiency and reduce compute costs in AI-driven programming tasks.
Loading comments...
login to comment
loading comments...
no comments yet