Local Voice Control Setup for AI Agents on Apple Silicon

This setup details how to implement local voice control for AI agents using Parakeet STT and Kokoro TTS on Apple Silicon, specifically tested on a Mac Mini M4. The goal was to achieve a fully local and fast voice interaction layer, eliminating dependencies on cloud services.
Key Details
- Hardware: Mac Mini M4 running OpenClaw + Claude as the AI agent.
- Software Setup: Parakeet for speech-to-text (STT) which transcribes voice input in approximately 240ms, and Kokoro for text-to-speech (TTS) that provides nearly instant responses.
- Benefits: Transitioning from typing to voice commands significantly enhances workflow flexibility, allowing for office-independent operation, such as from the balcony or while walking a dog.
- Challenges: Occasionally, the STT struggles with accent recognition, humorously leading to the AI agent correcting the user’s pronunciation.
- Enhancements: A browser extension incorporating a 3D avatar named Mimora enables visual interaction, showing various expressions like listening, thinking, and happy states during agent responses.
This configuration is ideal for those seeking cloud-independent, fast voice interaction with AI agents, particularly using Apple Silicon hardware.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Native macOS MCP Server for Full OS Control
A native macOS server provides 24 tools for pixel-accurate clicks, key combos, drag-and-drop, app management, multi-display support, and clipboard access. It's open source and works with Claude Code, Cursor, or any MCP client.

Claude Code Prompt Architecture Reverse-Engineered for Local Models
A clean-room reimplementation of Claude Code's 26-prompt architecture is now available on GitHub, offering system prompts, tool prompts, safety rules, memory compression, and verification patterns for building coding agents on local models like Ollama, llama.cpp, or vLLM.

Benchmarking Nemotron 3 Super 120B with 1M token context on M1 Ultra
A user tested Nemotron 3 Super 120B with a Q4_K_M quantized model using llama.cpp on an M1 Ultra, achieving a 1 million token context window that consumed approximately 90GB of VRAM. Performance benchmarks show token generation speeds ranging from 255 t/s at 512 prompt processing down to 22.37 t/s at 100,000 token context.

Open Source Second Brain System Built on Claude Code for Task Management
An open source system called Kipi System uses Claude Code to track open threads, draft follow-ups, and manage tasks by pulling from calendar, email, CRM, and social feeds. It generates a daily HTML file with pre-written actions sorted by friction.