Claude Code 2.0 Is Promising but Flawed (www.aiengineering.report)

🤖 AI Summary
Anthropic’s Claude Sonnet 4.5 has landed as a clear win for coding agents: in head‑to‑head tests (notably a web dev Stripe integration) it beat GPT‑5 Codex on speed and quality, and the model’s “think harder” behavior is now easier to invoke via a new tab-to-think mode or by prompts like “think harder/ultrathink.” Sonnet 4.5 also outperforms Opus on many tasks, signaling substantive model improvement and renewed confidence in Claude as a primary coding assistant. But Claude Code v2’s new workflow features feel undercooked. The /rewind checkpoint system auto-saves Claude edits and lets you revert code, conversation, or both, yet it only tracks model edits (not user edits or shell commands) and destructively restores states — far weaker than Git’s branch/commit safety and auditability. The /usage command is similarly minimal: it reads local usage data (~/.claude/projects JSONL) but only reports percent remaining for session/week, omitting token counts, cost estimates (API vs plan), and per-day/month views; third‑party tools like claude-monitor already provide richer token/cost breakdowns. In short, Sonnet 4.5 is a big step forward for model quality, but Anthropic’s built‑in developer tooling and observability still lag mature ecosystems (Git, external monitors), so engineers should pair Claude with established version control and monitoring to avoid surprises.
Loading comments...
loading comments...