Qwen3.6 Plus benchmark comparison against Western SOTA models

✍️ OpenClawRadar📅 Published: April 5, 2026🔗 Source

A Reddit post on r/LocalLLaMA compares Qwen3.6 Plus against several Western state-of-the-art models across multiple benchmarks. The comparison includes specific performance metrics for each model.

Benchmark Results

The source provides these exact scores:

Qwen3.6-Plus: SWE-bench Verified 78.8, GPQA / GPQA Diamond 90.4, HLE (no tools) 28.8, MMMU-Pro 78.8
GPT‑5.4 (xhigh): SWE-bench Verified 78.2, GPQA / GPQA Diamond 93.0, HLE (no tools) 39.8, MMMU-Pro 81.2
Claude Opus 4.6 (thinking heavy): SWE-bench Verified 80.8, GPQA / GPQA Diamond 91.3, HLE (no tools) 34.44, MMMU-Pro 77.3
Gemini 3.1 Pro Preview: SWE-bench Verified 80.6, GPQA / GPQA Diamond 94.3, HLE (no tools) 44.7, MMMU-Pro 80.5

The post includes a visual comparison chart available at: https://preview.redd.it/6kq4tt07yrsg1.png?width=714&format=png&auto=webp&s=ad8b207fb13729ae84f5b74cec5fd84a81dcface

User Assessment

The original poster notes that Qwen3.6 Plus is "competitive but not the bench" and states: "Will be my new model given how cheap it is, but whether it's actually good irl will depend more than benchmarks." They also observe that "Opus destroys all others despite being 3rd or 4th on artificalanalysis."

📖 Read the full source: r/LocalLLaMA

👀 See Also

News

India's Sarvam and Krutrim build frugal AI models for local needs

Indian startups Sarvam AI and Krutrim are developing sovereign AI models optimized for low-end smartphones and low bandwidth networks, with Sarvam's 24-billion parameter SarvamM model trained across 10 Indian languages.

Apr 16, 2026, 02:45 PM UTC

OpenClawRadar

News

Anthropic releases Claude Code Remote Control for mobile development

Anthropic has launched Remote Control, a feature that lets Claude Code users control their local development sessions from mobile devices. Available initially to Claude Max subscribers, it requires version 2.1.52 and uses a QR code to sync sessions.

Feb 25, 2026, 09:45 PM UTC

OpenClawRadar

News

Anthropic Secures 300MW Compute at Colossus 1 with 220,000 NVIDIA GPUs via SpaceX Partnership

Anthropic announced a partnership with SpaceX to use all compute capacity at the Colossus 1 data center, gaining over 300MW and more than 220,000 NVIDIA GPUs within a month.

May 6, 2026, 06:16 PM UTC

OpenClawRadar

News

NVIDIA DGX Spark Community Launches Spark Arena for Reproducible LLM Benchmarks

The NVIDIA DGX Spark community has launched Spark Arena, a reproducible leaderboard for open-weights LLM performance using standardized tools and methodology, with current top performers including gpt-oss-120b and Qwen3-Coder-Next.

Mar 1, 2026, 03:45 PM UTC

OpenClawRadar