Lightning MLX: Fast Local AI Engine for Apple Silicon Agentic Use Delivers 220 tok/s on Qwen 35B-A3B

✍️ OpenClawRadar📅 Published: May 8, 2026🔗 Source

A new open-source inference engine for Apple Silicon called Lightning MLX claims to be the fastest local AI engine, specifically optimized for agentic workflows — coding agents, tool calling, and short-turn tasks. The project is available on GitHub at samuelfaj/lightning-mlx.

Benchmark Results

The author tested on a MacBook Max M5 with 128GB RAM and reported the following token generation speeds:

Qwen3.6-27B: 40.67 tok/s
Qwen3.6-35B-A3B: 220.86 tok/s

These results suggest that the engine is particularly efficient for the mixture-of-expert architecture used in the Qwen3.6-35B-A3B model, which activates only a subset of parameters per token.

Key Features

Optimized for short-turn agentic use cases — code generation, tool calls, and rapid inference loops
Includes a preset configuration called MTPLX (custom sampling defaults); the author is seeking feedback on whether these defaults make sense for production use
Open source under the MIT license (likely) on GitHub

Feedback Requests

The creator is actively asking the community for:

Better benchmark designs for local coding agents
Opinions on the MTPLX preset defaults
Test results on other Apple Silicon configurations (e.g., M1, M2, M3, M4, different RAM sizes)

Who It's For

Developers running local LLMs on Apple Silicon for agentic coding workflows who need maximum inference speed.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Skales: A Desktop AI Agent That Connects to Ollama Without Docker

Skales is a desktop AI agent that connects to Ollama locally, requiring no Docker setup. It offers features like email management via Gmail IMAP, browser automation, and voice chat using Whisper through Groq.

Mar 16, 2026, 10:45 AM UTC

OpenClawRadar

Tools

agentmemory V4 achieves 96.2% on LongMemEval benchmark, outperforms commercial AI memory systems

agentmemory V4 scored 96.2% on LongMemEval, beating several funded AI memory companies including PwC Chronos (95.6%), Mastra (94.87%), and OMEGA (93.2%). The system was built solo in 16 days on a mid-range gaming PC with a $1,000 budget.

Mar 27, 2026, 01:45 AM UTC

OpenClawRadar

Tools

AI Doomsday Toolbox v0.932 adds benchmarking, dataset creation, and agent workspace for Android local AI

AI Doomsday Toolbox v0.932 introduces benchmarking for local LLMs on Android devices, a dataset creator that converts text/PDF files to Alpaca JSON format, and an AI agent workspace with Termux integration. The update also includes subtitle burning with Whisper and built-in Ollama management tools.

Apr 16, 2026, 04:47 PM UTC

OpenClawRadar

Tools

Sonarly: AI-driven Production Alert Triage and Resolution

Sonarly connects with observability tools to triage and resolve production alerts, reducing noise and focusing on critical issues.

Feb 17, 2026, 09:45 PM UTC

OpenClawRadar