What happens when coding agents stop feeling like dialup? (martinalderson.com)

🤖 AI Summary
Coding agents that once felt magical are beginning to act like late‑90s dial‑up: flaky, slow, and prone to retries. Users report reliability issues (notably at Anthropic), and OpenRouter’s small-sample telemetry — caveated as <1% of global traffic and skewed by free “Grok” usage — still indicates an enormous (~50x) jump in token volume. That matters because agentic coding workflows can consume roughly 1,000× more tokens than simple chats, and frontier models today often only generate 30–60 tokens/sec. The result: slow, brittle developer loops when running supervised agents like Claude Code, though experimental stacks (e.g., Cerebras/Gemini CLI) have demonstrated 20–50× higher throughput (~2,000 tok/s), which shifts the bottleneck from the model to the human reviewer. If token throughput rises, agent paradigms will change: fast models could enable unsupervised or semi‑supervised workflows that spawn 5–10 parallel attempts with automated evaluation, delivering richer candidate code in developer‑friendly latency. But infrastructure and economics are major constraints — exploding demand creates an “infinite loop” of consumption, semiconductor scaling is slowing, and providers may introduce new pricing or off‑peak tiers to manage peaks. Practical takeaway: teams should watch throughput and tooling advances closely, design workflows for supervised fallbacks, and be prepared for shifting cost models as agentic development becomes both more powerful and more resource‑intensive.
Loading comments...
loading comments...