Testing Claude Sonnet with a Strategy Board Game: Rule Adherence Challenges

Testing Strategy Games with Claude Sonnet
A developer on r/ClaudeAI tested Claude Sonnet by playing OFMOS® Essential, a patented strategy board game where players manage a product portfolio across a positioning map. The test involved playing the game manually against the model, prompt by prompt.
Implementation Details
The developer designed a structured system prompt containing:
- The full ruleset of OFMOS® Essential
- A text-based board representation
- Action definitions
- Scoring instructions
- Turn management directives
After each turn, Claude updated the board state and running scores based on the structured prompt system.
Performance Assessment
Claude Sonnet demonstrated several capabilities:
- Understood the game rules correctly
- Articulated strategic reasoning during gameplay
- Tracked scores consistently throughout the game
However, the model frequently made illegal moves. The developer noted this was expected behavior since the system lacked a constrained move-generation layer, requiring the model to self-enforce rules—a task where it often broke down.
Developer Questions
The developer is seeking community input on similar experiments with board or strategy games, specifically asking about:
- Experiences with rule adherence in different models
- Observations about strategic depth in AI gameplay
- Which models performed best in similar scenarios
This type of testing is useful for developers working with AI coding agents to understand the practical limitations of language models in rule-based environments where precise constraint enforcement is required.
📖 Read the full source: r/ClaudeAI
👀 See Also

Homelab Developer Benchmarks 19 Local LLMs with 45 Practical Tests on AMD Strix Halo
A developer created a 45-test benchmark suite for local LLMs based on actual homelab use cases like email classification, Home Assistant automation, and meal planning. Testing 19 models on an AMD Strix Halo with 128GB RAM and 96GB VRAM, Gemma 4 26B-A4B performed best after bug fixes.

From Zero Code to 25M Game Plays: A Non-Engineer's Journey Building with Claude + Cursor
A developer with no coding experience built three browser games (25M total plays, 200K daily) using Claude via Cursor. Two games are single 8,000-line HTML files. Total tool cost: ~$2K/month.

Localizing Large Codebases with LLMs: A Developer's Workflow for 4,500 UI Keys
A developer shares their workflow for localizing a game with 4,500 UI keys using LLMs. They found that adding context to translation prompts and using local models like Qwen 3 8B produced acceptable quality, while cloud models like Claude and Gemini Pro struggled with file size and accuracy.

Developer Builds Text-Based Game Track Star Using Claude as Coding Partner
A developer used Claude as a primary coding partner to build Track Star, a text-based track and field career simulation game, filling gaps in Python knowledge during evening and weekend work over several months. The polished demo launched on Steam last week.