Mistral Voxtral Realtime 4B in Pure C for Speech-to-Text

The Mistral Voxtral Realtime 4B is a speech-to-text model implemented in pure C, providing a dependency-free alternative to those relying exclusively on the C standard library. The repository, voxtral.c by antirez, facilitates the inference pipeline without requiring Python runtime, CUDA toolkit, or any other external library at inference time.

Key Features

Pure C Implementation: No external dependencies beyond the C standard library are required, making it suitable for environments where minimal dependency is critical.
Platform Specific Backends: Offers two make targets: make mps for Apple Silicon which provides faster processing, and make blas for Intel Mac or Linux systems equipped with OpenBLAS, albeit with slower performance due to conversion needs from bf16 to fp32.
Audio Processing: Utilizes a chunked encoder with overlapping windows to bound memory usage, irrespective of input length. It also allows audio input through stdin or microphone on macOS, enhancing its versatility for live or file-based transcription tasks.
Streaming C API: The API, vox_stream_t, permits incremental audio feeding and outputs token strings as they are generated.

Usage

Download the model (~8.9GB) using ./download_model.sh.
For audio transcription from a file: ./voxtral -d voxtral-model -i audio.wav.
Live transcription from a mic on macOS: ./voxtral -d voxtral-model --from-mic.
Transcoding and transcription with ffmpeg: ffmpeg -i audio.mp3 -f s16le -ar 16000 -ac 1 - 2> /dev/null | ./voxtral -d voxtral-model --stdin.

The project is open to further testing, as it currently relies on limited samples. Full production readiness might require more work, particularly in handling long transcriptions to test the KV cache's circular buffer.

📖 Read the full source: HN AI Agents

Exploring Mistral Voxtral Realtime 4B in Pure C for Speech-to-Text

Key Features

Usage

👀 See Also

Two New Open Source Tools for AI Agent Security and Optimization

sandboxd: Open-Source Tool to Run Multiple Claude Code Agents in Isolated Containers

Decision Passport: An Audit Layer for AI Agent Execution Governance

Octopoda: Open Source Memory Layer for Local AI Agents