Echo-TTS Ported to Apple Silicon with MLX for Native TTS with Voice Cloning

Echo-TTS, a 2.4B parameter diffusion transformer (DiT) model for text-to-speech with voice cloning, has been ported from CUDA to run natively on Apple M-series silicon using MLX. The port allows the model to generate speech in a target voice when given text and a short audio clip of someone talking.
Performance and Benchmarks
On a base 16GB M4 Mac mini, the model generates a short 5-second voice clone in about 10 seconds. Clones up to 30 seconds take approximately 60 seconds to generate.
Key Features
- 8-bit quantization: Reduces memory usage from approximately 6 GB to about 4 GB, runs faster with negligible quality loss.
- Blockwise generation: Enables streaming and audio continuations.
Development Details
This was an AI-assisted port. Claude Opus 4.6 handled specification and validation, GPT-5.3-Codex performed the implementation, and the developer steered the project through OpenClaw.
The repository is available at github.com/mznoj/echo-tts-mlx.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Ninetails Memory Engine V4.5: Int8 Quantization + LRU Cache Cuts Local MCP Memory to 60MB
The Ninetails Memory Engine V4.5 uses Int8 scalar quantization and LRU cache eviction to reduce vector storage from 6KB to 1.5KB per embedding, keeping the entire engine at 40-60MB RAM. It combines 70% vector similarity with 30% BM25 search in a fully local SQLite implementation.

Qwen 3.5 35B Running on 8GB VRAM with llama.cpp Configuration
A developer shares their llama.cpp configuration for running Qwen 3.5 35B (Q4_K_M GGUF) on an RTX 4060m with 8GB VRAM, achieving 700 t/s prompt processing and 42 t/s generation, and discusses using Cline in VSCode with kat-coder-pro and qwen3.5 modes.

Eden AI: European API Hub for AI Models – Pivots as OpenRouter Alternative
Eden AI offers a single unified API to access 500+ AI models (LLMs, vision, OCR, speech) with smart routing, fallback mechanisms, and region control. Positioned as a European alternative to OpenRouter.

blend-ai: New Blender MCP Service for Claude Code
blend-ai is a new Blender MCP service that allows Claude Code to generate 3D scenes. A user reported it worked faster and better than blender-mcp, creating a shuttle launch scene from reference images in 5 minutes.