Claude Code vs. Codex CLI: Head to Head (mattwigdahl.substack.com)

🤖 AI Summary
An engineer re-ran an apples-to-apples qualitative comparison between Anthropic’s Claude Code and OpenAI’s Codex CLI using recent model updates (Claude Code 2 with Sonnet 4.5 / Opus variants vs. GPT-5-Codex-high). The test converted a SPEC.md into a small React/TypeScript app via three standardized prompt templates (design, stepwise implementation, and debugging) run from each agent’s CLI harness. Claude Code historically felt like a paradigm shift—its agentic terminal interface that can search, edit, build, and run tests made it more capable and reliable early on—but Codex has caught up rapidly as GPT-5 powered Codex matured. Technically notable points: the author used constrained, repeatable prompts to keep results comparable; Claude models advertise a 1M token window though Code uses ~200k, GPT-5 advertises 400k but actual usage is opaque; pricing favors GPT-5 (≈ $1.25/M input, $10/M output) versus Sonnet 4 (~$3/$15 per M tokens), which matters for large-scale developer workflows. Practical outcomes: both agents generated working apps after iterative debugging—Claude required multiple iterations on UI/CSS, data-constraint, and animation bugs; Codex initially hit a Vite config and Zod-install issue that was quickly fixed. Implication: CLI agent quality, model iteration cadence, and cost-performance tradeoffs now jointly determine which tool makes sense for real development—developers should benchmark on their own stacks and prefer stepwise prompting and harness-aware workflows.
Loading comments...
loading comments...