Building a voice-controlled multi-agent system on top of Claude Code

A developer on r/ClaudeAI built a weekend project that adds voice control to Claude Code on macOS, complete with a wake word, WebRTC voice loop, and a multi-agent orchestration system. What started as a convenience hack turned into a system where a lead agent decomposes tasks, recruits sub-agents, and runs them in parallel with auto-triggered QA passes.
How it works
- Wake word: "Yabby" triggers the voice loop. The developer chose a custom wake word to avoid conflicts with Siri or other assistants.
- Voice loop: WebRTC handles real-time audio streaming. The system uses Anthropic's Realtime API for speech-to-text and text-to-speech; target latency is under 300ms, but the API sometimes causes delays.
- Lead agent: Receives the voice request, performs a discovery phase, creates a project plan, and recruits a small team (manager + 2-3 sub-agents) to execute steps.
- Parallel execution: Sub-agents run in parallel where possible, sequentially otherwise. Each agent gets its own Claude Code CLI session with a separate thread — conversations don't bleed.
- Auto-QA: When a sub-agent finishes, a review pass is triggered with a 5-second debounce to prevent pile-ups. During testing, one agent caught a bug written by another agent — an emergent behavior the developer didn't expect.
- Plan approval modal: Before any agent executes, a modal pops up for the user to vet the plan. This prevents the system from running unverified actions.
Pain points
- Speaker verification: Uses cosine-similarity on speaker embeddings. The threshold is hard to tune — too tight rejects the user when they have a cold; too loose allows anyone in the room to trigger commands.
- Locale issues: French was the default locale because the code was written that way. The developer is slowly fixing it.
- Background task lifecycle: When the parent Claude Code CLI process exits, background tasks die silently. The developer wrote an OS-level PID watcher with a bookkeeper shell script to track which long-lived servers have crashed.
- Over-planning: The lead agent sometimes produces a four-phase project plan for trivial requests like renaming a file.
Open questions
The developer is still figuring out how to reduce verbosity in the QA phase, whether to let sub-agents recruit their own sub-agents (recursive delegation), and how to keep voice latency under 300ms when the Realtime API gets cranky. They're also curious how Anthropic's official voice mode (rolled out to 5% of users) will handle multi-agent coordination.
📖 Read the full source: r/ClaudeAI
👀 See Also

Homebutler: OpenClaw Skill for Homelab Management via Telegram
Homebutler is a single Go binary (~13MB, zero dependencies) that works as an OpenClaw skill to manage homelabs from Telegram chat. It monitors servers, restarts Docker containers, wakes machines, scans networks, and alerts on resource spikes without SSH sessions or dashboard logins.

Skir: A Modern Alternative to Protocol Buffers for Type-Safe Data Exchange
Skir is a declarative language for defining data types, constants, and APIs that generates idiomatic, type-safe code in TypeScript, Python, Java, C++, Kotlin, and Dart from a single .skir file. It includes built-in schema evolution safety, RPC support similar to gRPC, and serialization to JSON or binary formats.

Indie Developer Unveils 'Ideanator' CLI Tool for Structuring Vague Ideas with Local LLMs
Ideanator is a CLI tool designed by a self-taught 19-year-old developer using local LLMs like Ollama/MLX. It structures vague ideas into well-defined concepts, completely offline.

FOMOE Enables 397B Qwen3.5 Model Inference on $2,100 Desktop Hardware
FOMOE (Fast Opportunistic Mixture of Experts) allows running Qwen3.5's 397 billion parameter flagship model at 5-9 tokens/second on consumer hardware using two $500 GPUs, 32GB RAM, and an NVMe drive with Q4_K_M quantization.