Effective harnesses for long-running agents (www.anthropic.com)

0 points 10 hours ago ago | visit original

🤖 AI Summary

AI researchers running the Claude Agent SDK addressed a core limitation of long-running agent workflows — discrete sessions with no persistent memory across context windows — by building a two-part harness: an initializer agent that scaffolds the repo and environment, and a coding agent that makes incremental, well-tested progress each session. They found that model compaction alone (saving and compressing context) wasn’t enough: frontier models like Opus 4.5 tended to “one-shot” large tasks or prematurely declare projects finished, leaving half-implemented features and broken states for the next session. The initializer creates an init.sh, a git repo with an initial commit, and a structured feature_list.json (hundreds of end-to-end features marked failing) plus a claude-progress.txt log so future sessions can quickly reconstruct state. Practically, the coding agent is restricted to one feature at a time, must run the dev server, execute end-to-end checks (via Puppeteer MCP for web UI), commit with descriptive git messages, and update the progress file only by toggling a passes flag in JSON. This regimen reduces wasted tokens, enables reliable rollbacks, and forces verification before marking features done. The approach shows how engineering best practices (init scripts, structured tests, git history) let agents bridge context windows; future work explores specialized multi-agent roles and generalizing these harness patterns beyond full‑stack web apps.

Loading comments...

loading comments...