Current LLM Cost Comparison: Deepseek, Qwen, MiniMax vs OpenAI

Pricing Breakdown by Provider
Here's the current cost comparison among major LLM providers based on a recent Reddit analysis. All prices are in USD per 1 million tokens and sourced as of the analysis date.
- Deepseek-V3.2: $0.26 input / $0.38 output. This is approximately 10x cheaper than GPT-4 while delivering what benchmarks suggest is GPT-5 class performance.
- Qwen3.5 series: The 27B model costs $0.26 input / $2.60 output, delivering quality comparable to Claude at a fraction of the cost. The series provides flexibility spanning from 0.8B to 397TB parameters, with every variant supporting 262k context windows extendable to 1M+ and built-in thinking mode.
- MiniMax-M2.5: $0.27 input / $0.95 output. Excels for coding workflows with 80.2% on SWE bench verified, making it outstanding for agentic coding tasks.
- OpenAI GPT-4.1: $2.00 input / $8.00 output. While certainly capable, the pricing premium is difficult to justify for high volume production use cases when alternatives perform comparably.
Key Technical Context
The analysis includes LMSYS ELO scores where available, as most other benchmarks appear to be optimized at this point. Context window capacity has become increasingly important, with most current models supporting 200k+ tokens as standard, which fundamentally changes how you can structure applications around long documents and extended conversations.
For developers using AI coding agents, these pricing disparities are significant when considering production deployment costs. The data suggests that alternatives to premium-priced models like GPT-4 can deliver comparable performance at substantially lower costs, particularly for high-volume use cases.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Analysis: Comparing the AI Industry to Subprime Mortgage Crisis Patterns
Edward Zitron's analysis draws parallels between the 2008 subprime mortgage crisis and current AI industry trends, citing specific data points about adjustable-rate mortgages and their similarities to AI investment patterns.

Why OpenClaw's Open Source Architecture Matters

Coinbase x402 vs Google A2A: Two Opposite Payment Orderings for Agent-to-Agent Payments
Building agent-to-agent payments reveals a fundamental split: Coinbase's x402 middleware settles after work (verify→run→settle), while Google's A2A extension settles before (verify→settle→run) for slow agentic calls.

Claude Code Generates Python Script That Finds 10,069-Digit Emirp Record
Anthropic's Claude Opus 4.6 generated a Python script that discovered a 10,069-digit emirp (reversible prime) in about one day of CPU time, breaking the previous world record. The script uses four tiers of prime sieves including a CUDA kernel for fast random number generation.