M5 Max vs M3 Max Inference Benchmarks for Qwen Models on oMLX

✍️ OpenClawRadar📅 Published: March 28, 2026🔗 Source

Reddit user /u/onil_gova ran inference benchmarks comparing 16-inch MacBook Pros with M5 Max and M3 Max processors, both equipped with 40 GPU cores and 128GB unified memory. The tests used oMLX v0.2.23 and three Qwen 3.5 models: the 122B-A10B MoE, 35B-A3B MoE, and 27B dense.

Benchmark Results

At pp1024/tg128 (prompt processing length 1024, token generation length 128), the M5 Max showed significant speed improvements:

35B-A3B MoE: 134.5 vs 80.3 tg tok/s (1.7x faster)
122B-A10B MoE: 65.3 vs 46.1 tg tok/s (1.4x faster)
27B dense: 32.8 vs 23.0 tg tok/s (1.4x faster)

The performance gap widens with longer contexts. At 65K context length, the 27B dense model dropped to 6.8 tg tok/s on M3 Max versus 19.6 tg tok/s on M5 Max (2.9x difference).

Prefill and Batching Performance

Prefill advantages were even larger, reaching up to 4x faster on M5 Max at long context lengths, attributed to the M5 Max's GPU Neural Accelerators.

Batching performance showed important differences for agentic workloads:

M5 Max scaled to 2.54x throughput at 4x batch size on the 35B-A3B model
M3 Max batching on dense models degraded performance (0.80x at 2x batch on the 122B model)

The bandwidth difference (614 GB/s on M5 Max vs 400 GB/s on M3 Max) is significant for multi-step agent loops or parallel tool calls.

MoE Efficiency Insights

The benchmarks revealed that the 122B model (with 10B active parameters) generates faster than the 27B dense model on both machines. This demonstrates that active parameter count determines inference speed, not total model size.

The full interactive breakdown with all charts and data is available at: https://claude.ai/public/artifacts/c9fba245-e734-4b3b-be44-a6cabdec6f8f

📖 Read the full source: r/LocalLLaMA

👀 See Also

News

AI Agents Are Killing Code Review — The Principal-Agent Problem Explained

Inserting AI agents into the traditional code review process doubles review load, collapses trust signals, and creates an unsustainable imbalance — this is the principal-agent problem as applied to software engineering.

May 8, 2026, 08:18 AM UTC

OpenClawRadar

News

Delve accused of forking Sim.ai's open-source SimStudio and selling it as Pathways

Compliance startup Delve allegedly forked Sim.ai's open-source agent-building tool SimStudio, rebranded it as Pathways, and sold it without proper license attribution or monetary agreement with Sim.ai, potentially violating Apache license terms.

Apr 5, 2026, 09:45 PM UTC

OpenClawRadar

News

Claude Opus 4.6 effort=low parameter causes lazy agent behavior

When using effort=low with Claude Opus 4.6, agents made fewer tool calls, were less thorough in cross-referencing, and ignored parts of system prompts about web research. Switching to effort=medium resolved the issues.

Mar 12, 2026, 09:45 PM UTC

OpenClawRadar

News

Open-source models match or beat Claude Opus 4.6 on benchmarks

DeepSeek V3.2, DeepSeek R1, Kimi K2.5, and MiniMax M2.5 outperform Claude Opus 4.6 on 4 out of 5 major benchmarks including MMLU-Pro, speed, tool use, and reasoning, while being significantly cheaper.

Mar 19, 2026, 06:45 PM UTC

OpenClawRadar