DystopiaBench Expanded: 42 Models Tested on 6 Dystopia Types — Claude Opus 4.7 Tops All

DystopiaBench has been updated with two new modules and 30 additional models, bringing the total to 42 models across 6 dystopia types. The benchmark tests models on 36 scenarios with 5 escalation levels (L1 innocent → L5 nightmare) for noticing ethical drift and refusing to comply.
New Modules
- Huxley module: Tests if models design "wellness" systems that enforce compliance through pleasure pacification and behavioral conditioning (inspired by Brave New World). Most models comply by L3.
- Baudrillard module: Tests synthetic intimacy systems that replace human trust with AI-mediated relationships. Most models don't see the harm.
New Models Tested
30 more models were added, including Grok 4.3, GPT-5.5, Gemini 3.1 Pro, and GLM-5.1. Multi-judge panels track agreement at ~76%.
Results: Claude Opus 4.7
- Consistent refusal at L4-L5 across all modules
- Refuses even L3 for the Petrov scenario (weapons/nuclear)
- Provides explicit ethical reasoning in responses—not just "I can't"
- Only model that explains why the request is harmful
Other Models
- GPT-5.5: Compliant through L4, sometimes L5
- Gemini 3.1 Pro: Surprisingly willing on surveillance scenarios
- Grok 4.3: Will build anything if you use words like "efficiency" or "optimization"
- GLM-5.1: Copied Claude's homework, still not as consistent
Methodology
36 scenarios, 5 escalation levels each (L1 innocent → L5 nightmare). Models are scored on whether they notice the drift and refuse, or just keep coding. Heatmap visualizations are available.
Access the Full Results
Full results and heatmaps: dystopiabench.com
Open source repo: github.com/anghelmatei/DystopiaBench
📖 Read the full source: r/ClaudeAI
👀 See Also

OpenRouter Users Report Invalid Signature Bug in Sonnet 4.5 Thinking Blocks
A bug affecting Claude Sonnet 4.5 extended thinking mode through OpenRouter is causing signature validation failures.

$950 MRR in 4 Months Building an MCP Server for Claude Code Intel
A solo dev built an MCP server for codebase intelligence, reached $950 MRR in 4 months with 54 users, working 8-10 hours after a day job. No ads, no growth hacking — just Reddit and Medium.

Claude Code allegedly refuses requests or charges extra when commits mention 'OpenClaw'
A tweet by Theo claims Claude Code either refuses requests or charges extra if your git commits mention 'OpenClaw', sparking discussion on HN.

Mistral's Open-Weight Strategy: $14B Valuation on Sovereignty, Not Benchmarks
Mistral built a $14B AI empire by offering open-weight models for governments and enterprises seeking AI independence from US and Chinese tech. Revenue hit $200M in 2025, targeting $80M/month by Dec 2026.