Qwen3.6 Plus benchmark comparison against Western SOTA models

✍️ OpenClawRadar📅 Published: April 5, 2026🔗 Source
Qwen3.6 Plus benchmark comparison against Western SOTA models
Ad

A Reddit post on r/LocalLLaMA compares Qwen3.6 Plus against several Western state-of-the-art models across multiple benchmarks. The comparison includes specific performance metrics for each model.

Benchmark Results

The source provides these exact scores:

  • Qwen3.6-Plus: SWE-bench Verified 78.8, GPQA / GPQA Diamond 90.4, HLE (no tools) 28.8, MMMU-Pro 78.8
  • GPT‑5.4 (xhigh): SWE-bench Verified 78.2, GPQA / GPQA Diamond 93.0, HLE (no tools) 39.8, MMMU-Pro 81.2
  • Claude Opus 4.6 (thinking heavy): SWE-bench Verified 80.8, GPQA / GPQA Diamond 91.3, HLE (no tools) 34.44, MMMU-Pro 77.3
  • Gemini 3.1 Pro Preview: SWE-bench Verified 80.6, GPQA / GPQA Diamond 94.3, HLE (no tools) 44.7, MMMU-Pro 80.5

The post includes a visual comparison chart available at: https://preview.redd.it/6kq4tt07yrsg1.png?width=714&format=png&auto=webp&s=ad8b207fb13729ae84f5b74cec5fd84a81dcface

Ad

User Assessment

The original poster notes that Qwen3.6 Plus is "competitive but not the bench" and states: "Will be my new model given how cheap it is, but whether it's actually good irl will depend more than benchmarks." They also observe that "Opus destroys all others despite being 3rd or 4th on artificalanalysis."

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also