Open-weight models under 100GB can't beat Claude Haiku on coding benchmarks

A recent analysis of open-weight language models reveals a significant performance gap compared to Anthropic's Claude Haiku on coding benchmarks. The comparison was conducted using specific testing parameters and memory requirements.
Benchmark methodology
The evaluation compared models on two coding benchmarks: LiveBench (January 2026) and Arena Code/WebDev. Testing was performed against Claude Haiku 4.5 with thinking capabilities enabled. Models were plotted according to memory requirements for local deployment.
Technical specifications
- Quantization: Q4_K_M
- Context length: 32K
- KV cache: q8_0
- VRAM estimation: Calculated using the author's custom calculator
Key findings
No open-weight model under 100GB of memory comes close to Claude Haiku's performance on either benchmark. The nearest competitor is Minimax M2.5, which requires approximately 136GB of memory and roughly matches Haiku's performance on both benchmarks.
The analysis highlights the current gap between proprietary and open-weight models in the under-100GB category for coding tasks. The author expresses frustration with this limitation and calls for development of smaller models that could at least match Haiku's capabilities.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Kimi $19/m Update: Enhancing OpenClaw with Structured Models
Kimi introduces its latest update priced at $19/month, focusing on enhancing model structuring within OpenClaw. This update promises streamlined operations and improved automation features.

OpenClaw Agent System Broken After Recent Updates
Recent OpenClaw updates have broken core agent functionality, with users reporting that agents can't be reliably created or run. The system previously allowed creating agents, having them appear correctly, running workflows, and using them for real tasks.

Claude Code Source Leak Reveals Anti-Distillation, Undercover Mode, and Frustration Detection
A leaked source code map file from Claude Code's npm package reveals anti-distillation techniques using fake tools, an undercover mode that hides AI authorship, and frustration detection via regex patterns.

OpenClaw users report high API costs from vague prompts, developer advises structured workflows
A Reddit user reports a $300 Anthropic bill from OpenClaw due to vague prompting, with the community noting the orchestrator works best with clear intentions and structured workflows rather than acting as a 'genie' for wishful thinking.