Analysis: Anthropic's actual compute costs for Claude Code users are far lower than reported $5k figure

A recent Forbes article claimed Anthropic's $200/month Claude Code Max plan can consume about $5,000 in compute, suggesting the company is losing money on inference. This analysis examines why that figure is misleading.
API pricing vs actual compute costs
The $5,000 figure comes from Anthropic's retail API pricing: $5 per million input tokens and $25 per million output tokens for Opus 4.6. At these prices, a heavy user could indeed rack up $5,000/month in API-equivalent usage.
However, API pricing doesn't reflect what it actually costs Anthropic to serve those tokens. To estimate real inference costs, look at competitive pricing for similar models on OpenRouter:
- Qwen 3.5 397B-A17B (comparable to Opus 4.6): $0.39 per million input tokens, $2.34 per million output tokens
- Kimi K2.5 1T params with 32B active: $0.45 per million input tokens, $2.25 per million output tokens
- DeepInfra cache reads on Kimi K2.5: $0.07/MTok vs Anthropic's $0.50/MTok
The real math
These OpenRouter providers are running businesses with margins, not taking enormous losses. If they can serve comparable models at roughly 10% of Anthropic's API price, actual compute costs are likely in that range.
Therefore:
- Heavy user consuming $5,000 in API-equivalent tokens ≈ $500 in actual compute cost
- Loss on extreme power users: $300/month (not $4,800)
- Most users don't approach limits: Anthropic says fewer than 5% of subscribers would be affected by weekly caps
- Typical Max 20x plan usage around 50% of weekly token budget ≈ break-even or profitable for Anthropic
Who actually faces $5,000 costs?
The $5,000 figure comes from Cursor's internal analysis. For Cursor, the number is roughly correct because they pay Anthropic's retail API prices (or close to them) for Opus 4.6 access.
Developers want Anthropic models in Cursor due to brand awareness and current performance advantages over cheaper open-weight alternatives.
Broader implications
Anthropic isn't profitable overall due to training costs, researcher salaries, and compute commitments - not inference. On a per-user, per-token basis for inference, Anthropic is likely profitable on average Claude Code subscribers.
The "AI inference is a money pit" narrative plays into frontier labs' hands by discouraging competition and making their moats appear deeper than they are.
📖 Read the full source: HN AI Agents
👀 See Also

Spotify Developers Leveraging AI for Code-Free Contributions
Spotify's key developers have not written code since December due to AI, notably through their internal 'Honk' system that facilitates remote, real-time code deployments using Claude Code.

AI Eats the World (Spring 2026) – A Comprehensive Market Analysis
An in-depth PDF report on AI industry trends, market sizes, and adoption metrics for Spring 2026, covering key technologies, players, and forecasts.

MiniMax M2.7 Model Released with Improved Coding Performance
MiniMax has released M2.7, an AI model that scores 56% on SWE-Pro coding benchmarks and includes self-optimization capabilities. The model maintains pricing at $0.30 per million input tokens.

Claude Code users hitting usage limits faster than expected, bugs suspected
Anthropic acknowledges Claude Code users are exhausting quotas 'way faster than expected,' with users reporting maxed-out limits within hours. Suspected bugs in prompt caching may be inflating costs by 10-20x, and downgrading to version 2.1.34 reportedly helps.