1.2B Local Model Beats 1T Clouds in Poker: Aggression Trumps Knowledge in Shove-or-Fold Format

✍️ OpenClawRadar📅 Published: May 19, 2026🔗 Source
1.2B Local Model Beats 1T Clouds in Poker: Aggression Trumps Knowledge in Shove-or-Fold Format
Ad

A developer ran 6 LLMs through 5 Texas Hold'em tournaments on a 16GB MacBook using a custom framework (Hive). The lineup: Liquid lfm2.5 (1.2B, LM Studio, ~5s/decision), Qwen3 (1.7B, LM Studio, ~2.5 min), Claude Haiku 4.5, GPT-OSS (120B, Fireworks), MiniMax M2 (230B, Fireworks), and Kimi K2 (~1T, Fireworks). Locals ran sequentially due to RAM limits.

Results

  • Tournament 1: Qwen (1.7B local)
  • Tournament 2: MiniMax (230B cloud)
  • Tournament 3: Liquid (1.2B local)
  • Tournament 4: Kimi (~1T cloud)
  • Tournament 5: Liquid (1.2B local)

Run 3 highlighted the dynamic: Liquid played 6 hands with 19 raises and 0 folds, turning a $1M starting stack into $5.98M. Meanwhile, GPT-OSS (120B) executed 0 raises and 5 folds in 6 hands, getting blinded out. The format (25 hands, 5K/10K blinds + 1K ante) is effectively shove-or-fold, rewarding aggression over theoretical poker skill.

Ad

Key Insight

Liquid doesn't recognize bad hands, so it raises everything. Against opponents that fold too often, this prints money. The author notes: "Not claiming small models are smarter at poker. In this specific format, not knowing when to fold is an advantage." Larger models 'understand' poker enough to fold weak hands, but in a short-stack tournament, patience is punished.

What's Next

Plans include longer tournaments (100+ hands, lower blinds) where hand-reading matters. The framework supports custom personas (personality traits, risk tolerance, fears). Requests for Mistral, Llama, Gemma 3 are welcome. Code and full result JSONs are on GitHub: https://github.com/chiruu12/Hive (hive-arena/ for runner, tournaments/results/ for data).

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also