Qwen3.5-122B-A10B-MINT-MLX runs smoothly on M5 Pro with 64GB RAM

Local LLM Performance on Apple Silicon
A Reddit user has shared their experience running the Qwen3.5-122B-A10B-MINT-MLX model locally on an M5 Pro with 64GB RAM. The setup demonstrates that large language models can run effectively on consumer hardware with proper configuration.
Configuration Details
The user achieved smooth performance using specific terminal commands for VRAM allocation:
sysctl iogpu.unified_memory_limit_percentage
sudo sysctl iogpu.wired_limit_mb=61440
In LM Studio, they set the context window to 16384 tokens. With this configuration, the system maintained stable performance while running Safari with multiple tabs, Messages, and Activity Monitor simultaneously.
Performance Benchmarks
The Qwen3.5-122B-A10B-MINT-MLX model delivered:
- Time to First Token: 0.86 seconds
- Token Generation Speed: 39.58 tokens/second
The user noted the model "solved a bunch of riddles correctly and did a bit of vibe coding" with no complaints about the 3-bit MINT quantization. The only issue occurred when the context window filled up near 59GB VRAM usage, causing system lockup.
Comparison with Other Models
The user also tested "Qwen3.5 40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking Mxfp8," which they found to be more accurate than the 122B model but significantly slower:
- Token Generation Speed: 6.93 tokens/second
- Prompt processing remained fast despite slower generation
This demonstrates the trade-off between model size, quantization, and inference speed that developers face when choosing local LLM configurations.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Google's Nano Banana 2 AI Image Model: Features and Availability
Google DeepMind released Nano Banana 2, an image generation model combining Nano Banana Pro's advanced features with Gemini Flash's speed. It offers subject consistency for up to five characters, supports resolutions from 512px to 4K, and is rolling out across Google products.

Claude.ai, API, and Claude Code Experiencing Elevated Errors
Claude.ai, the Claude API, and Claude Code are experiencing elevated errors with the web interface and developer console down. Claude Code login via Claude.ai is broken, though logged-in users can still use it.

FFmpeg Developer Accuses OxideAV of AI License Laundering in MagicYUV Issue
An FFmpeg developer has opened an issue on OxideAV's magicyuv repo, challenging the project's licensing and alleging AI-assisted license laundering of GPL code.

PwC 2026 CEO Survey: 56% Report Zero Financial Return from AI, Only 12% Succeed
PwC surveyed 4,454 CEOs across 95 countries and found 56% report zero financial impact from AI, while only 12% have successfully used AI to both cut costs and grow revenue. The successful 'Vanguard' companies are 3x more likely to apply AI directly to products and services.