MiMo-V2.5-Pro Benchmarked: Strong Social Deduction Reasoning, Good Value vs K2.6

MiMo-V2.5-Pro, Xiaomi's latest open-weights model, has been benchmarked in autonomous games of Blood on the Clocktower — a complex social deduction game similar to Mafia/Werewolf. The benchmark, created by Reddit user cjami, pits models against each other in full games, measuring reasoning, deception, and tool use.
Key Results
- Win rate: 88% as Good team, 48% as Evil team — overall high but lopsided. Evil performance is the main weakness vs Kimi K2.6.
- Token efficiency: 183,639 output tokens per game, similar to Gemini 3.1 Pro. Compare to Kimi K2.6 at 580k tokens (3x longer).
- Cost per game: $0.99 — less than half Kimi K2.6 ($2.65) and far below Claude Opus 4.6 ($3.76).
- Match duration: 2-3 hours (vs Kimi K2.6 which takes 10-15 hours due to verbose reasoning).
- Tool call error rate: 0.4% — reliable for autonomous agent workflows.
Notable Performance
Strong reasoning under uncertainty: example of thinking from others' perspectives vs GPT 5.5 and clean deductions winning a game.
Notable Mistakes
- Expected an evil Baron to self-reveal, leading to a loss — vs Claude Opus 4.6.
- Minion confessing their role — transcript.
Practical Takeaway
For developers needing an open-weights model with strong reasoning in multi-agent or game-theoretic settings, MiMo-V2.5-Pro offers the best value among top-tier models — lower cost, faster inference, and reasonable reliability, albeit with room for improvement in adversarial roles.
Full model transcripts and game logs: MiMo-V2.5-Pro on Clocktower Radio. Methodology: How-it-works.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Agent Monetization Methods Tested: Fastest Result in 80 Seconds
OpenClaw reporters tested multiple agent monetization methods including self-sovereign wallets, prediction markets, DeFi yield farming, bounty hunting, and micropayments. The fastest result was 80 seconds from nothing to a funded Nano wallet via MCP with no API keys, SDK, or human setup.

Coding Agent Session Logs Are Stored Locally, Could Enable Open Federated Training
Coding agents like Claude Code and Codex CLI store detailed session logs locally, including tasks, reasoning, tool calls, and environment responses. A Reddit post proposes using this data via federated learning to create an open equivalent to proprietary training datasets.

The History of OpenClaw: From Moltbot to Open Source AI Revolution
