1.2B Local Model Beats 1T Clouds in Poker: Aggression Trumps Knowledge in Shove-or-Fold Format

✍️ OpenClawRadar📅 Published: May 19, 2026🔗 Source

A developer ran 6 LLMs through 5 Texas Hold'em tournaments on a 16GB MacBook using a custom framework (Hive). The lineup: Liquid lfm2.5 (1.2B, LM Studio, ~5s/decision), Qwen3 (1.7B, LM Studio, ~2.5 min), Claude Haiku 4.5, GPT-OSS (120B, Fireworks), MiniMax M2 (230B, Fireworks), and Kimi K2 (~1T, Fireworks). Locals ran sequentially due to RAM limits.

Results

Tournament 1: Qwen (1.7B local)
Tournament 2: MiniMax (230B cloud)
Tournament 3: Liquid (1.2B local)
Tournament 4: Kimi (~1T cloud)
Tournament 5: Liquid (1.2B local)

Run 3 highlighted the dynamic: Liquid played 6 hands with 19 raises and 0 folds, turning a $1M starting stack into $5.98M. Meanwhile, GPT-OSS (120B) executed 0 raises and 5 folds in 6 hands, getting blinded out. The format (25 hands, 5K/10K blinds + 1K ante) is effectively shove-or-fold, rewarding aggression over theoretical poker skill.

Key Insight

Liquid doesn't recognize bad hands, so it raises everything. Against opponents that fold too often, this prints money. The author notes: "Not claiming small models are smarter at poker. In this specific format, not knowing when to fold is an advantage." Larger models 'understand' poker enough to fold weak hands, but in a short-stack tournament, patience is punished.

What's Next

Plans include longer tournaments (100+ hands, lower blinds) where hand-reading matters. The framework supports custom personas (personality traits, risk tolerance, fears). Requests for Mistral, Llama, Gemma 3 are welcome. Code and full result JSONs are on GitHub: https://github.com/chiruu12/Hive (hive-arena/ for runner, tournaments/results/ for data).

📖 Read the full source: r/LocalLLaMA

👀 See Also

News

Claude adds inline interactive charts and diagrams to conversations

Claude now creates custom charts, diagrams, and visualizations directly within chat conversations, allowing users to tweak and modify visualizations as discussions develop. The feature is available in beta on all plan types and appears inline rather than in side panels.

Mar 13, 2026, 12:45 AM UTC

OpenClawRadar

News

Four UX/Product Gaps Identified in Claude's Onboarding Experience

A user identified four specific UX/product gaps while setting up Claude across Desktop, Cowork, Dispatch, and the iPhone app during active use. Issues include Dispatch tasks entering infinite loops when desktop is offline, single persistent threads in Dispatch, tab-anchored chat panels in Chrome, and missing Google Drive files in the mobile app knowledge base UI.

Apr 13, 2026, 08:11 PM UTC

OpenClawRadar

News

Claude Code Bug Replaces German Umlauts with ASCII Substitutes

Claude Code and Claude.ai app have been randomly replacing German umlauts (ä, ö, ü, ß) with ASCII substitutes (ae, oe, ue, ss) since December 2025. The bug persists despite explicit instructions and has gone unfixed for over 3 months with no response from Anthropic support.

Mar 27, 2026, 09:45 PM UTC

OpenClawRadar

News

OpenClaw 3.31 Update Resets Agent Permissions and Settings

OpenClaw update 3.31 automatically disabled all agent tools, computer access permissions, and sub-agents, requiring manual re-enabling in Settings. The update also changed how permission requests work, no longer prompting for approval during use.

Apr 13, 2026, 09:45 PM UTC

OpenClawRadar