Managing long contexts in agentic coding systems (cto.new)

0 points 1 day ago ago | visit original

🤖 AI Summary

Sudhir Balaji’s piece surveys practical strategies for managing long contexts in agentic coding systems, arguing the problem is often a system-design challenge rather than purely algorithmic. He reviews three dominant patterns: compaction (summarizing and replacing old conversation segments), agentic memory (persisting artifacts like scratchpads, TODOs or code state outside the LLM window), and sub-agents (delegating expensive retrieval/hunting tasks to smaller, faster models). Balaji highlights recent moves in the field — Anthropic’s automated clearing, Cognition’s SWE-grep/SWE-grep-mini for fast context retrieval, and MoonshotAI’s “send message back in time” trick — and outlines pragmatic fallbacks like compacting whole task runs or recursively summarizing summaries when windows fill. For builders, the piece emphasizes task decomposition and background-agent architectures as high-leverage solutions: cto.new models a unit of work as a task with multiple task-runs, keeping full history until the window is exhausted and then compacting entire runs to preserve coherence. A planning agent can chunk large requests (e.g., “implement a waitlist”) into smaller, reviewable tasks, reducing token use, costs, and error-prone hunting behavior. Balaji cautions that very large or tightly-coupled tasks may resist decomposition and that longer context LLMs will help but won’t eliminate context-rot or cost/latency tradeoffs — reinforcing that cheaper retrieval sub-agents, careful compaction, and good system design remain essential.

Loading comments...

loading comments...