Claude Code vs. Codex: Real-World Build Test – 36 Files vs. 28, Infinite Loop, and $0.46 Cost Difference

✍️ OpenClawRadar📅 Published: May 14, 2026🔗 Source
Claude Code vs. Codex: Real-World Build Test – 36 Files vs. 28, Infinite Loop, and $0.46 Cost Difference
Ad

A developer ran a head-to-head comparison of Claude Code and Codex (via Cursor) using identical prompts and the same MCP setup (GitHub + Slack). No hints, no extra help. Two tasks:

  • Task 1: PR triage bot – read open PRs, score complexity, write report, ping Slack for high priority. Required retry logic, error logging, strict TypeScript (no any).
  • Task 2: Real-time code review UI – React, WebSocket, inline comments, optimistic updates with rollback, virtualized diff viewer, reconnect with backoff. No UI libraries, everything from scratch.

Results

  • Claude Code: Verified MCP tools were live before writing code. Built 36 files in 12 minutes. Included a two-client WebSocket smoke test not asked for. Broadcast latency: 3ms. Zero any. Passed typecheck first try.
  • Codex (Cursor): Couldn't access GitHub MCP on Task 1 (Cursor's execution path didn't expose tool descriptors). Got tool not found after 3 retries, but logged and handled cleanly – environment issue, not model quality. Task 2 shipped a working UI in ~15 min, 5ms latency. First compile had TypeScript errors and an infinite React loop (useEffect calling hydrate repeatedly) that needed a ref guard patch.
Ad

Cost

API cost across both tasks: Claude ~$2.50, Codex ~$2.04. Claude was ~23% more expensive but delivered more granular architecture and a first-run clean UI.

Key Takeaways

The author notes the two tools aren't really competing for the same use case. Claude Code feels like pairing with someone who reads the docs first; Codex feels like a senior dev who wants to ship fast. Neither leaked any, neither hallucinated a tool name, and both got WebSocket broadcast under 10ms – a clear improvement over six months ago.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also