Qwen3.5-122B-A10B-MINT-MLX runs smoothly on M5 Pro with 64GB RAM

✍️ OpenClawRadar📅 Published: April 20, 2026🔗 Source
Qwen3.5-122B-A10B-MINT-MLX runs smoothly on M5 Pro with 64GB RAM
Ad

Local LLM Performance on Apple Silicon

A Reddit user has shared their experience running the Qwen3.5-122B-A10B-MINT-MLX model locally on an M5 Pro with 64GB RAM. The setup demonstrates that large language models can run effectively on consumer hardware with proper configuration.

Configuration Details

The user achieved smooth performance using specific terminal commands for VRAM allocation:

sysctl iogpu.unified_memory_limit_percentage
sudo sysctl iogpu.wired_limit_mb=61440

In LM Studio, they set the context window to 16384 tokens. With this configuration, the system maintained stable performance while running Safari with multiple tabs, Messages, and Activity Monitor simultaneously.

Ad

Performance Benchmarks

The Qwen3.5-122B-A10B-MINT-MLX model delivered:

  • Time to First Token: 0.86 seconds
  • Token Generation Speed: 39.58 tokens/second

The user noted the model "solved a bunch of riddles correctly and did a bit of vibe coding" with no complaints about the 3-bit MINT quantization. The only issue occurred when the context window filled up near 59GB VRAM usage, causing system lockup.

Comparison with Other Models

The user also tested "Qwen3.5 40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking Mxfp8," which they found to be more accurate than the 122B model but significantly slower:

  • Token Generation Speed: 6.93 tokens/second
  • Prompt processing remained fast despite slower generation

This demonstrates the trade-off between model size, quantization, and inference speed that developers face when choosing local LLM configurations.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also