Benchmarking the Latest AI Models: The Rise of Extreme Models

The recent benchmarking of 40 new AI models brings to light significant shifts in the Price vs. Performance landscape. With attention focused on Kimi k2.5 and Claude Opus 4.6, the analysis reveals a divide into two extremes: 'God Mode' and 'Flash Mode', rendering mid-range models ineffective.
Key Details
- Kimi k2.5 Situation: Attempts to benchmark Kimi k2.5 were unsuccessful due to persistent 'No Content' errors, likely due to overload. However, Kimi-k2-Thinking performed adequately for complex reasoning tasks at ~15 TPS.
- Speed Dominance: For latency-sensitive applications, Liquid LFM 2.5 emerged as the speediest model clocking in at ~359 tokens/sec, followed by Ministral 3B at ~293 tokens/sec.
- Cost Efficiency: Ministral 3B stands out as the most cost-effective solution, at $0.10/1M input tokens. It is ~17x cheaper and ~40% faster than GPT-5.2 Codex, making it a strong value play against higher-priced options.
The recommendation is to avoid mid-range models that cost between $0.50 - $1.00, as they do not offer competitive performance. Depending on your needs, choose higher-priced models like Opus/GPT-5 for intelligence or opt for cost-effective speed with Liquid/Mistral.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Tolan's AI-Enabled Engineering Interview Process
Tolan has redesigned their engineering interview to mirror day-to-day work with AI coding agents. Candidates get a few hours to build a feature from a Figma spec or short specification, using AI tools like Claude, Codex, Cursor, or Gemini.

Allbirds pivots from footwear to AI infrastructure, shares surge 580%
Shoe brand Allbirds announced a $50 million deal to become an AI compute infrastructure business called NewBird AI, causing its shares to rise 580%. The company plans to buy GPUs and offer on-demand graphics chips and cloud services for AI.

Claude's speech recognition limitations and user workaround with Spokenly and Parakeet TDT
A user reports Claude's built-in microphone transcription is inaccurate compared to ChatGPT's, creating more work than it saves. They implemented a workaround using Spokenly on Mac with NVIDIA's Parakeet TDT model for improved performance.

VS Code to Enable Co-Authored-by Copilot Trailer by Default
Microsoft's VS Code PR #310226 changes the git.addAICoAuthor setting default from 'off' to 'all', automatically adding a Co-authored-by trailer for AI-generated contributions. The PR also reveals a runtime fallback mismatch in repository.ts.