Claude Code vs. Codex, a small test (danieltenner.com)

🤖 AI Summary
A small side-by-side test on a Rails project (HelixKit) pitted Anthropic’s Claude Code against OpenAI’s Codex on a message-ordering bug that made the assistant “ignore” recent user inputs. Both models were given the same console output and context: AIResponseJob calls chat.complete, RubyLLM’s to_llm iterates chat.messages, and Chat.last.messages was coming back shuffled. Claude Code implemented a straightforward fix—adding has_many :messages, -> { order(:created_at) }—and explained why explicit ordering prevents ActiveRecord from returning rows out of chronological order. Codex went a step further and identified the true root cause: an earlier acts_as_chat hook already defines messages ordered by created_at, but the file later redefined has_many :messages without that scope, removing the ordering. Its recommended fix was to drop the redundant override (less code), restoring the original ordering behavior. Technically, the issue centers on ActiveRecord associations and nondeterministic row return order (exacerbated by caching or concurrent requests) which causes RubyLLM to build conversation history out of sequence. The practical fixes are either to add an ordering scope to the association or remove the override so acts_as_chat’s ordering stands. The test highlights two points for the AI/ML dev community: (1) code models can reliably find subtle ORMs/association bugs and (2) models that prefer minimal, context-aware edits (as Codex did here) may offer better maintenance-safe suggestions—an important consideration when choosing AI coding assistants.
Loading comments...
loading comments...