Sonnet 4.5 in Claude Code vs. GPT-5 High in Codex CLI: Why Speed Won for Me (coding-with-ai.dev)

🤖 AI Summary
The author ran A/B tests comparing Claude Code (Sonnet/Claude Code 2.0) against a Codex CLI running GPT‑5 High while doing real coding work. Although Codex produced slightly more accurate answers on individual tasks, it was slower—and that latency pushed the author to switch tasks while waiting. Those micro context switches shattered their working-memory "graph" (current file, recent errors, mental model of dataflow), forcing costly reloads when the response arrived. With the faster Claude Code 2.0, the author stayed in the problem, preserved flow state, and ultimately made more progress despite marginally lower per-response accuracy. The takeaway for the AI/ML community: latency and interaction design matter as much as raw model accuracy or massive context windows (200K–2M tokens). Tool evaluations should include human-centered metrics—time-to-resolution, interruption cost, and how a model’s responsiveness supports sustained attention—because faster, slightly less-accurate systems can win in real workflows by reducing context-switching overhead. In short: optimizing model context is vital, but we also need to protect human context.
Loading comments...
loading comments...