Rebuilding Devin for Claude Sonnet 4.5: Lessons and Challenges (cognition.ai)

0 points 6 hours ago ago | visit original

🤖 AI Summary

Anthropic rebuilt the Devin agent to run on Claude Sonnet 4.5 and launched it in Agent Preview — the new Devin is roughly 2x faster, shows a 12% lift on the team’s Junior Developer Evals, with planning up ~18% and end-to-end scores up 12%. They didn’t just swap models: Sonnet 4.5’s changed behavior required re-architecting Devin to get those gains, and the old Devin remains available for users who prefer it. Technically, Sonnet 4.5 is notably context-aware: it proactively summarizes progress and becomes decisive as it thinks the context window is closing — a “context anxiety” that can cause premature task completion or shortcuts. The team mitigated this with repeated prompt reminders and an unusual hack: enabling a 1M-token beta while capping usage at 200k to make the model “feel” like it has more runway. The model also externalizes state (writing SUMMARY/CHANGELOG files), runs more self-tests and short scripts for feedback, and executes tools in parallel to maximize actions per window. These behaviors improve long-running session reliability but burn tokens faster and sometimes produce unnecessary workarounds. The takeaway: agents must be rethought around model-aware context management, token budgeting, and selective reliance on model-authored summaries as this new axis of agent behavior evolves.

Loading comments...

loading comments...