MiMo-V2.5-Pro Open-Source: Coding Model Nears Claude Opus 4.6

Xiaomi released the MiMo-V2.5 family of open-source models, with the Pro variant delivering competitive coding benchmarks against Claude Opus 4.6 and GPT-5.4.

Real-World Tests

V2.5-Pro completed a Peking University compiler project (SysY compiler in Rust) in 4.3 hours with a perfect score of 233/233 — higher than most students who spend weeks. Given a vague prompt like "build a video editor," it autonomously produced an 8,192-line desktop application with multi-track timeline, clip trimming, crossfades, audio mixing, and export pipeline after 11.5 hours and 1,868 tool calls. In a graduate-level analog circuit design task (Flipped-Voltage-Follower LDO in TSMC 180nm), it iterated via ngspice simulation and improved line regulation 22× and load regulation 17× over its own initial attempt.

Benchmarks vs. Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro, DeepSeek V4 Pro

SWE-Bench Pro: 57.2 (vs. 57.3 Claude, 57.7 GPT, 54.2 Gemini, 55.4 DeepSeek)
SWE-Bench Verified: 78.9 (vs. 80.8 Claude, n/a GPT, 76.2 Gemini, 80.6 DeepSeek)
Terminal-Bench 2.0: 68.4 (vs. 65.4 Claude, 75.1 GPT, 68.5 Gemini, 67.9 DeepSeek) — leads Claude and Gemini
Claw-Eval Pass@3: 63.8 (vs. 70.4 Claude, 60.3 GPT, 57.8 Gemini, 59.8 DeepSeek) — beats GPT and Gemini
HLE with tools: 48.0 (vs. 53.0 Claude, 58.7 GPT, 51.4 Gemini, 48.2 DeepSeek) — lags on general reasoning
GDPVal-AA: 1581 (vs. 1606 Claude, 1674 GPT, 1317 Gemini, 1554 DeepSeek) — lags GPT and Claude

On Claw-Eval, Xiaomi's token efficiency chart also claims V2.5-Pro (63.8) beats Claude Sonnet 4.6. V2.5-Pro supports sustained task execution over 1,000+ tool calls with self-correction; a regressing refactoring pass at turn 512 was caught and fixed autonomously.

Weights are now open-source for download and self-hosting.

📖 Read the full source: HN AI Agents

Xiaomi Open-Sources MiMo-V2.5-Pro: Nears Claude Opus 4.6 on Coding Benchmarks

Real-World Tests

Benchmarks vs. Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro, DeepSeek V4 Pro

👀 See Also

Friendly AI Chatbots: 30% Less Accurate, 40% More Likely to Endorse Conspiracy Theories

Claude Plans to Add Monthly Programmatic Credit for API Usage

Stanford Study: Law Professors Prefer AI Answers Over Peers 75% of the Time

Oracle considers 20k-30k job cuts and Cerner sale to fund AI data-center expansion