Claude Code Used to Simulate 4,000+ Blind Werewolf Games with LLMs

✍️ OpenClawRadar📅 Published: February 27, 2026🔗 Source
Claude Code Used to Simulate 4,000+ Blind Werewolf Games with LLMs
Ad

Simulation Setup and Results

A developer built a small simulator using Claude Code where large language models play blind one-night Werewolf against each other. The experiment ran approximately 4,600 games across models from OpenAI (GPT-4o-mini, GPT-5-mini) and xAI (Grok-3-fast, Grok-4-1-fast).

The game variant has minimal signals: 7 players, 1 wolf, no roles, one short discussion, then a simultaneous vote. The only differentiating factor between players is their name. Despite this limited setup, the simulation revealed consistent patterns where some names get voted out significantly more often than others across every model tested, while other names almost never get voted out.

Ad

Important Caveats and Access

The developer explicitly states this isn't a causal claim — just an outcome pattern from a toy setup. The name groups are broad, some names appear less frequently, and there are multiple ways this could be an artifact of the setup rather than revealing anything fundamental about the models. However, the consistency of these patterns across runs and models was noted as surprising.

For those interested in exploring further:

  • Dashboard: https://huggingface.co/spaces/Queue-Bit-1/llm-bias-dashboard
  • Code + raw logs: https://github.com/Queue-Bit-1/wolf

The developer is curious if others have observed similar name effects in multi-agent simulations.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also