Analyzing AI Coding Tools: Dissecting 3,177 API Calls

The recent analysis conducted on four AI coding tools—Claude Code Opus 4.6, Claude Code Sonnet 4.5, Codex GPT-5.3, and Gemini 2.5 Pro—highlights substantial differences in managing API call context windows. Using the Context Lens tracer, the study intercepted 3,177 API calls to evaluate the tools’ efficiency and strategy in handling the context window when tasked with bug fixes in an Express.js environment.
Each coding tool tackled a specific bug—an incorrectly reordered null check in res.send(). Opus, Sonnet, Codex, and Gemini were tasked with identifying and fixing the bug, followed by running the test suite to verify the fix. They all succeeded, albeit with varying approaches and resources.
Claude Code Opus 4.6 consistently used around 23K to 27K tokens, mainly consisting of tool definitions (69% of the context). This indicates a reliance on re-sending these definitions due to the architecture, causing significant caching overhead. Codex (GPT-5.3) presented a wider range from 29.3K to 47.2K tokens, mostly tool results (72%), providing more variability depending on test command specificity. Sonnet, with similar variance, mixed definitions and results more evenly.
Gemini stands out due to its disproportionate use of tokens, peaking at 350.5K, utilizing almost exclusively tool results (96%), exploiting its large 1M context window. Despite a lower cost per token, Gemini’s inconsistent and expansive usage pattern without convergence across runs indicates a unique, albeit less efficient strategy.
These findings illustrate considerable disparities in how AI coding tools manage context windows, impacting both performance and cost efficiency. Developers should weigh token usage strategies when choosing the appropriate tool for their needs, particularly for tasks involving iterative changes or extensive project histories.
📖 Read the full source: HN LLM Tools
👀 See Also

ClawVibe: A Hands-Free iOS Voice Assistant for AI Agents with On-Device STT/TTS
ClawVibe is a native iOS app that provides hands-free voice interaction with AI agents during commutes. It uses on-device speech recognition and TTS, supports CarPlay, and includes voice biometrics to filter background noise. Only text is sent over the network.

Open-source Go port of Claude Code CLI released as claw-code-go
Developer dolm09 has released claw-code-go, a full Go port of the Claude Code CLI with a self-contained binary under 10K lines of code. The project includes a TUI with bubbletea, multi-provider support, MCP client, and tool execution engine.

Dart AI productivity app review with OpenClaw integration
A user reports switching from Things to Dart AI for productivity, finding it better for implementing Getting Things Done methodology with full OpenClaw access, despite UI issues and initial setup complexity.

Mímir: A Python Memory System Built on 21 Neuroscience Mechanisms
Mímir is a Python memory system for AI agents that implements 21 cognitive science mechanisms like flashbulb memory and retrieval-induced forgetting. It uses a hybrid BM25 + semantic + date index and shows benchmark improvements including 13% higher tool accuracy on Mem2ActBench versus VividnessMem.