Building a voice-controlled multi-agent system on top of Claude Code

✍️ OpenClawRadar📅 Published: May 25, 2026🔗 Source
Building a voice-controlled multi-agent system on top of Claude Code
Ad

A developer on r/ClaudeAI built a weekend project that adds voice control to Claude Code on macOS, complete with a wake word, WebRTC voice loop, and a multi-agent orchestration system. What started as a convenience hack turned into a system where a lead agent decomposes tasks, recruits sub-agents, and runs them in parallel with auto-triggered QA passes.

How it works

  • Wake word: "Yabby" triggers the voice loop. The developer chose a custom wake word to avoid conflicts with Siri or other assistants.
  • Voice loop: WebRTC handles real-time audio streaming. The system uses Anthropic's Realtime API for speech-to-text and text-to-speech; target latency is under 300ms, but the API sometimes causes delays.
  • Lead agent: Receives the voice request, performs a discovery phase, creates a project plan, and recruits a small team (manager + 2-3 sub-agents) to execute steps.
  • Parallel execution: Sub-agents run in parallel where possible, sequentially otherwise. Each agent gets its own Claude Code CLI session with a separate thread — conversations don't bleed.
  • Auto-QA: When a sub-agent finishes, a review pass is triggered with a 5-second debounce to prevent pile-ups. During testing, one agent caught a bug written by another agent — an emergent behavior the developer didn't expect.
  • Plan approval modal: Before any agent executes, a modal pops up for the user to vet the plan. This prevents the system from running unverified actions.
Ad

Pain points

  • Speaker verification: Uses cosine-similarity on speaker embeddings. The threshold is hard to tune — too tight rejects the user when they have a cold; too loose allows anyone in the room to trigger commands.
  • Locale issues: French was the default locale because the code was written that way. The developer is slowly fixing it.
  • Background task lifecycle: When the parent Claude Code CLI process exits, background tasks die silently. The developer wrote an OS-level PID watcher with a bookkeeper shell script to track which long-lived servers have crashed.
  • Over-planning: The lead agent sometimes produces a four-phase project plan for trivial requests like renaming a file.

Open questions

The developer is still figuring out how to reduce verbosity in the QA phase, whether to let sub-agents recruit their own sub-agents (recursive delegation), and how to keep voice latency under 300ms when the Realtime API gets cranky. They're also curious how Anthropic's official voice mode (rolled out to 5% of users) will handle multi-agent coordination.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also