Lightning MLX: Fast Local AI Engine for Apple Silicon Agentic Use Delivers 220 tok/s on Qwen 35B-A3B

✍️ OpenClawRadar📅 Published: May 8, 2026🔗 Source
Lightning MLX: Fast Local AI Engine for Apple Silicon Agentic Use Delivers 220 tok/s on Qwen 35B-A3B
Ad

A new open-source inference engine for Apple Silicon called Lightning MLX claims to be the fastest local AI engine, specifically optimized for agentic workflows — coding agents, tool calling, and short-turn tasks. The project is available on GitHub at samuelfaj/lightning-mlx.

Benchmark Results

The author tested on a MacBook Max M5 with 128GB RAM and reported the following token generation speeds:

  • Qwen3.6-27B: 40.67 tok/s
  • Qwen3.6-35B-A3B: 220.86 tok/s

These results suggest that the engine is particularly efficient for the mixture-of-expert architecture used in the Qwen3.6-35B-A3B model, which activates only a subset of parameters per token.

Ad

Key Features

  • Optimized for short-turn agentic use cases — code generation, tool calls, and rapid inference loops
  • Includes a preset configuration called MTPLX (custom sampling defaults); the author is seeking feedback on whether these defaults make sense for production use
  • Open source under the MIT license (likely) on GitHub

Feedback Requests

The creator is actively asking the community for:

  • Better benchmark designs for local coding agents
  • Opinions on the MTPLX preset defaults
  • Test results on other Apple Silicon configurations (e.g., M1, M2, M3, M4, different RAM sizes)

Who It's For

Developers running local LLMs on Apple Silicon for agentic coding workflows who need maximum inference speed.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also