Microsoft VibeVoice: 60-Min ASR and 90-Min TTS Models Open-Sourced

Microsoft open-sourced VibeVoice, a family of frontier voice AI models covering both ASR and TTS. The ASR model (VibeVoice-ASR-7B) handles up to 60 minutes of long-form audio in a single pass (64K token window), outputting structured transcriptions with speaker ID, timestamps, and text — supporting over 50 languages. It also supports user-customized hotwords for domain-specific terms. The TTS model (VibeVoice-TTS-1.5B) can synthesize up to 90 minutes of multi-speaker speech (up to 4 speakers). A real-time variant (VibeVoice-Realtime-0.5B) supports streaming text input and long-form generation with multilingual voices (9 languages) and 11 English style voices.
Key Technical Details
- Core innovation: Continuous speech tokenizers (Acoustic and Semantic) at an ultra-low frame rate of 7.5 Hz, preserving audio fidelity while boosting computational efficiency for long sequences.
- Architecture: Next-token diffusion framework — an LLM handles textual context and dialogue flow, a diffusion head generates high-fidelity acoustic details.
- ASR capabilities: Single-pass 60-minute audio, joint ASR + diarization + timestamping (Who, When, What), customizable hotwords.
- TTS capabilities: 90-minute long-form synthesis with up to 4 distinct speakers; real-time streaming via VibeVoice-Realtime-0.5B.
- Inference speedup: vLLM inference supported (see
vllm-asr). - Finetuning: ASR finetuning code is available.
- Hugging Face integration: VibeVoice-ASR is now part of the Transformers release (2026-03-06).
Quick links:
- ASR model: HF Link | Playground
- TTS model: HF Link (code disabled)
- Realtime TTS: HF Link | Colab
Note: The VibeVoice-TTS code was removed from the repo (2025-09-05) due to misuse concerns, but ASR and realtime TTS code remain active.
📖 Read the full source: HN AI Agents
👀 See Also

Fennara: Godot Plugin + MCP for AI Agents with Iterative Feedback Loop
Fennara is a Godot plugin and MCP server that gives AI agents script diagnostics, scene validation, runtime errors, node info, screenshots, and semantic search results after each edit — enabling a tighter feedback loop than one-shot commands.

Freestyle Launches Sandboxes for AI Coding Agents with Live Forking
Freestyle provides cloud sandboxes for AI coding agents that start in ~500ms and feature live forking with <400ms pause, allowing full VM clones including memory state. They run full Debian with hardware virtualization on bare metal infrastructure.

Hardware widget and Chrome extension monitor Claude API rate limits
A developer built a hardware widget using ESP8266 and OLED display that tracks Claude's rate limits in real time, plus a Chrome extension that intercepts Claude's internal /usage API and shows usage patterns. The total BOM cost is approximately $6.50.

Free macOS Menu Bar App Monitors Claude Usage in Real-Time
A developer built a free macOS menu bar app to monitor Claude usage entirely using Claude Code with Opus. The app shows 5-hour and 7-day session usage bars, context window fill percentage, and sends notifications when approaching limits.