Bayesian Analysis of Anthropomorphism in Claude Pokemon Chat

Research Methodology and Data Collection

A researcher conducted statistical analysis on Twitch chat messages from the Claude Plays Pokemon benchmark to explore how users anthropomorphize AI systems. The study focused specifically on the Mt. Moon segment, which took approximately 3 days for Claude to complete the first time. During this period, chat data was continuously collected via the Twitch API for several weeks.

The researcher used Gemini 2.0 Flash to annotate 107,000 messages for various features including whether Claude had some sort of false belief, got stuck, or displayed anthropomorphization. A manual verification sample was conducted to validate the labeling process, which had some errors but was considered decent.

Data Analysis and Findings

Anthropomorphization was simplified into four buckets based on previous research, with cognitive anthropomorphization being the most prevalent type. This makes sense given that Claude displayed its reasoning in real-time during the benchmark.

The analysis revealed that messages pertaining to Claude having a false belief were much more likely to contain anthropomorphization than messages without false belief tags. False belief events were relatively rare, with approximately 700 messages compared to the full Mt. Moon sample of about 87,000 messages.

Using Bayesian mixed-effects models with different levels of informative priors, the researcher found that false belief was one of the strongest predictors of anthropomorphization. Even under strong priors, a false belief tag was associated with approximately 15 percentage points higher predicted probability of anthropomorphization. In weak/moderate models, the probability rose from around 11% to approximately 45%.