Lightning MLX: Fast Local AI Engine for Apple Silicon Agentic Use Delivers 220 tok/s on Qwen 35B-A3B

A new open-source inference engine for Apple Silicon called Lightning MLX claims to be the fastest local AI engine, specifically optimized for agentic workflows — coding agents, tool calling, and short-turn tasks. The project is available on GitHub at samuelfaj/lightning-mlx.
Benchmark Results
The author tested on a MacBook Max M5 with 128GB RAM and reported the following token generation speeds:
- Qwen3.6-27B: 40.67 tok/s
- Qwen3.6-35B-A3B: 220.86 tok/s
These results suggest that the engine is particularly efficient for the mixture-of-expert architecture used in the Qwen3.6-35B-A3B model, which activates only a subset of parameters per token.
Key Features
- Optimized for short-turn agentic use cases — code generation, tool calls, and rapid inference loops
- Includes a preset configuration called MTPLX (custom sampling defaults); the author is seeking feedback on whether these defaults make sense for production use
- Open source under the MIT license (likely) on GitHub
Feedback Requests
The creator is actively asking the community for:
- Better benchmark designs for local coding agents
- Opinions on the MTPLX preset defaults
- Test results on other Apple Silicon configurations (e.g., M1, M2, M3, M4, different RAM sizes)
Who It's For
Developers running local LLMs on Apple Silicon for agentic coding workflows who need maximum inference speed.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Zora: Offline-First AI Agent with Default-Deny Security and Local Memory
Zora is an AI agent that runs fully offline via Ollama by default, starts with zero access permissions, and maintains persistent memory across sessions. It addresses security and cost issues seen in other agents.

Claude Code's Tool API Details Revealed
A Reddit user extracted details about Claude Code's tool API, including file system operations, bash execution, web search, and how tool calls are structured using XML-like blocks.

Qhatu: Platform Turns GitHub Repos into Pay-Per-Use Micro SaaS with Claude
Qhatu is a platform that takes a GitHub repository and deploys it as a pay-per-use micro SaaS with a generated frontend and integrated payment processing. The system uses Anthropic APIs to analyze code, generate Dockerfiles, and create storefront UIs.

Real-Time Desktop Overlay for Monitoring Claude Code Usage Limits
The open-source desktop overlay displays Claude Code usage limits in real-time, eliminating the need to repeatedly type '/usage'.