Claude Code vs Codex: 36 vs 28 files, $2.50 vs $2.04, infinite loop caught — real-world comparison

✍️ OpenClawRadar📅 Published: May 13, 2026🔗 Source
Ad

Someone on r/ClaudeAI ran a head-to-head comparison of Claude Code and Codex (via Cursor) on two practical tasks—same prompts, same MCP setup (GitHub + Slack), same machine. No benchmarks, real builds.

Tasks

  • Task 1: PR triage bot — Read open PRs, score by complexity (files ×2, lines/10, +3 for no labels, +5 for no reviewers), write a markdown report, post Slack alerts for high scores. Required retries, error logging, strict TypeScript, no any.
  • Task 2: Real-time code review UI — React + TypeScript, WebSockets, inline comment threads, optimistic updates with rollback, virtualized diff viewer, WS reconnect with exponential backoff. No UI libraries.

Claude Code results

  • Ran /mcp to verify tools before writing code
  • Built 36 files in ~12 minutes
  • Wrote an unprompted two-client WebSocket smoke test (broadcast: 3ms)
  • Zero any, passed typecheck first try
  • UI worked immediately
Ad

Codex (via Cursor) results

  • Failed Task 1: GitHub MCP wasn't reachable through Cursor's execution path. Handled it cleanly (retried 3x, logged errors, didn't crash), but no delivery.
  • Task 2: Shipped a working UI in ~15 minutes, smoke test passed at 5ms
  • Hit TypeScript errors on first compile and an infinite React loop (useEffect calling hydrate repeatedly). Needed a ref guard patch.
  • 28 files, more compact architecture

Cost (estimated, both tasks)

  • Claude: ~$2.50
  • Codex: ~$2.04
  • Difference: ~18-23%

Takeaways

Neither agent “won”. Claude feels like pairing with someone who verifies everything before touching the keyboard. Codex feels like a senior dev who wants to ship and move on. Both got WebSocket broadcast under 10ms—six months ago that wasn't a given. No any leaks, no hallucinated tool names.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also

TEMM1E v3.1.0: AI Agent That Self-Fine-Tunes Using User Interactions
Tools

TEMM1E v3.1.0: AI Agent That Self-Fine-Tunes Using User Interactions

TEMM1E v3.1.0 introduces Eigen-Tune, a system that captures LLM interactions as training data, scores quality from user behavior, and fine-tunes local models via LoRA with zero added LLM cost. Tested on Apple M2, it corrected temperature conversions from 72°F = '150°C' to '21.2°C' after 10 conversations.

OpenClawRadar
TRELLIS.2 Image-to-3D Ported to Run Natively on Apple Silicon
Tools

TRELLIS.2 Image-to-3D Ported to Run Natively on Apple Silicon

A developer has ported Microsoft's 4B parameter TRELLIS.2 image-to-3D model to run natively on Apple Silicon via PyTorch MPS, replacing CUDA-specific operations with pure-PyTorch alternatives. The port generates ~400K vertex meshes from single photos in about 3.5 minutes on M4 Pro with 24GB memory.

OpenClawRadar
NarrateAI MCP Server Demo Shows Claude Adding Voiceover to Videos
Tools

NarrateAI MCP Server Demo Shows Claude Adding Voiceover to Videos

A live demo shows Claude using the NarrateAI MCP server to automatically narrate videos from a URL, handling async polling and generating narration by analyzing silent screen recordings.

OpenClawRadar
Agent Safehouse: macOS-native sandboxing for local AI coding agents
Tools

Agent Safehouse: macOS-native sandboxing for local AI coding agents

Agent Safehouse is a macOS-native sandboxing tool that prevents local AI agents from accessing files outside your project directory using kernel-level enforcement. It's a single shell script with no dependencies that works with Claude Code, Codex, OpenCode, Amp, Gemini CLI, Aider, Goose, Auggie, Pi, Cursor Agent, Cline, Kilo, Code Droid, and other agents.

OpenClawRadar