Voxray-AI: Production Go Backend for Real-Time Voice Agent Pipelines

Production Voice Agent Pipeline in Go
Voxray-AI provides a complete streaming pipeline in Go that handles client audio through WebSocket or WebRTC, processes it through STT → LLM → TTS, and returns audio output. The system is designed for production-grade servers and high-concurrency voice workloads.
Transport Options
The system supports multiple transport mechanisms:
- WebSocket at
/wswith RTVI serializer (?rtvi=1) and Protobuf (?format=protobuf) support - WebRTC at
/webrtc/offerwith full SDP offer/answer, configurable STUN/TURN, and Opus encoding (requires CGO build) - Telephony runner transports: Twilio, Telnyx, Plivo, Exotel, LiveKit, Daily.co
Pluggable Providers
All components are swappable via configuration:
- STT providers: OpenAI, Groq, Sarvam, Google, AWS
- LLM providers: OpenAI, Anthropic, Groq, others
- TTS providers: OpenAI, Google, AWS Polly, Sarvam
Configuration Examples
Minimal configuration example:
{"transport": "both", "stt": { "provider": "groq", "model": "whisper-large-v3" }, "llm": { "provider": "anthropic", "model": "claude-3-5-haiku" }, "tts": { "provider": "google", "voice": "en-US-Neural2-F" }}Turn-taking and voice activity detection configuration:
{"turn_detection": "silence", "vad_type": "silero", "vad_confidence": 0.7, "vad_start_secs_vad": 0.2, "vad_stop_secs": 0.8, "turn_max_duration_secs": 30, "user_idle_timeout_secs": 60}Observability & Storage
/metricsendpoint for Prometheus (request counts, latency histograms, active connection gauges)- Recording: Full session audio to S3 with configurable worker pool and format
- Transcripts: Per-message storage to Postgres or MySQL with configurable table
/healthand/readyendpoints with optional Redis session store check on/ready
Security Features
server_api_keygates/ws,/webrtc/offer,/start,/sessions/*viaAuthorization: BearerorX-API-Key- CORS allowlist configuration
- TLS cert/key configuration
- 12-factor style: JSON config + environment variable overrides
This type of backend is useful for developers building real-time voice applications that need to integrate multiple AI services with production-ready infrastructure.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Radicle 1.8.0 Released: Decentralized Peer-to-Peer Code Forge Built on Git
Radicle 1.8.0 ships a sovereign, peer-to-peer code forge on Git with CLI, web UI, and desktop client. Repos replicate across peers using NoiseXK and a custom gossip protocol – no central server.

OpenClaw Alexa Voice Proxy Enables Bidirectional Voice Interaction
openclaw-alexa-voice is a Node.js proxy that connects an Alexa Custom Skill to the OpenClaw gateway with a three-tier response system for voice queries. It handles fast responses under 1 second, agent responses under 12 seconds, and deferred complex queries processed asynchronously within 2 minutes.

Chat Saver CG: Browser Extension Built with Claude Exports Conversations Across 12 AI Platforms
A developer built Chat Saver CG, a browser extension that exports and transfers conversations between Claude, ChatGPT, Gemini, and 9 other AI platforms, using Claude extensively for development including architecture decisions, debugging DOM parsing issues, and writing adapter logic.

quorum: AI Code Governance Tool Enforces Independent Model Review
quorum is a governance layer for AI-assisted development that enforces a consensus protocol requiring code to be independently reviewed by a different model before committing. It includes three structural gates that block progress: audit, retro, and quality gates.