JANG Quantization Method Improves MLX Performance for Large Models

✍️ OpenClawRadar📅 Published: April 18, 2026🔗 Source
JANG Quantization Method Improves MLX Performance for Large Models
Ad

Performance Gap Between MLX and GGUF Quantizations

The source discusses a significant performance issue with standard MLX quantization methods for large language models. On the MMLU benchmark (200 questions), MiniMax-M2.5 quantized to 4-bit for MLX scored only 26.5% (53/200), while the same model quantized with JANG_2S method scored 74% (148/200). The JANG method outperformed all MLX quantization levels (2-bit, 3-bit, and 4-bit), which all scored near random chance at approximately 25%.

Specific Benchmark Results

Detailed MMLU subject breakdown shows JANG_2L consistently outperforming MLX quantizations:

  • Abstract Algebra: JANG_2L 10/20 vs MLX 4-bit 3/20
  • Astronomy: JANG_2L 20/20 vs MLX 4-bit 7/20
  • College CS: JANG_2L 13/20 vs MLX 4-bit 4/20
  • HS Biology: JANG_2L 18/20 vs MLX 4-bit 4/20

The root cause identified for poor MLX performance is that "MLX generates meta-commentary instead of direct answers on this model."

Ad

Model Size and Performance Comparisons

For Qwen 3.5 122B model:

  • JANG_4K: 86% MMLU score, 69 GB size
  • MLX 4-bit: 85% MMLU score, 64 GB size
  • JANG_2S: 79% MMLU score, 38 GB size
  • MLX 2-bit: 56.5% MMLU score, 36 GB size

The author notes that "People trade the M chip speed for coherency, with no GGUF equivalent on MLX" and that "Qwen 3.5 on Macs when using GGUF is also 1/3rd slower than MLX."

MiniMax-M2.5 Code Generation Issue

From referenced benchmarks: "MiniMax-M2.5 can't code — 10% on HumanEval+ despite 87% tool calling and 80% reasoning. Something is off with its code generation format. Great for reasoning though."

Availability and Implementation

Currently available through:

  • MLX Studio: https://mlx.studio/ - has JANG_Q inferencing engine native
  • Repository: For self-installation and model quantization

The method allows running models like MiniMax-M2.5 at "2bit MLX equivalent while getting test results that just wasn't possible before on MLX."

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also