Exploring Mistral Voxtral Realtime 4B in Pure C for Speech-to-Text

The Mistral Voxtral Realtime 4B is a speech-to-text model implemented in pure C, providing a dependency-free alternative to those relying exclusively on the C standard library. The repository, voxtral.c by antirez, facilitates the inference pipeline without requiring Python runtime, CUDA toolkit, or any other external library at inference time.
Key Features
- Pure C Implementation: No external dependencies beyond the C standard library are required, making it suitable for environments where minimal dependency is critical.
- Platform Specific Backends: Offers two make targets:
make mpsfor Apple Silicon which provides faster processing, andmake blasfor Intel Mac or Linux systems equipped with OpenBLAS, albeit with slower performance due to conversion needs from bf16 to fp32. - Audio Processing: Utilizes a chunked encoder with overlapping windows to bound memory usage, irrespective of input length. It also allows audio input through stdin or microphone on macOS, enhancing its versatility for live or file-based transcription tasks.
- Streaming C API: The API,
vox_stream_t, permits incremental audio feeding and outputs token strings as they are generated.
Usage
- Download the model (~8.9GB) using
./download_model.sh. - For audio transcription from a file:
./voxtral -d voxtral-model -i audio.wav. - Live transcription from a mic on macOS:
./voxtral -d voxtral-model --from-mic. - Transcoding and transcription with
ffmpeg:ffmpeg -i audio.mp3 -f s16le -ar 16000 -ac 1 - 2> /dev/null | ./voxtral -d voxtral-model --stdin.
The project is open to further testing, as it currently relies on limited samples. Full production readiness might require more work, particularly in handling long transcriptions to test the KV cache's circular buffer.
📖 Read the full source: HN AI Agents
👀 See Also

Building a Self-Improving Knowledge System with Claude Code and Obsidian
A developer built a 25-tool system that gives Claude Code persistent memory through semantic search, knowledge graphs, and spaced repetition over an Obsidian vault. The system indexes content with bge-m3 embeddings, detects contradictions, auto-prunes stale notes, and generates Obsidian Canvas maps automatically.

Solo developer builds cross-platform desktop AI agent with mobile remote control in 3 weeks, ships to 40+ countries
A solo developer built Skales, a native desktop AI agent with 139+ tools and a mobile companion app for remote control — all in 3 weeks using Claude. The app runs on macOS, Windows, and Linux, is local-first and free, and already has active users in 40+ countries.

Fewshell: A Self-Hosted SSH Copilot That Refuses to Run Commands Without Human Approval
Fewshell is a mobile+desktop SSH copilot with mandatory human approval for every command – no setting to enable auto-approval. Built by an ex-Amazon AI SDE working on AI safety research.

Memctl: Open Source MCP Server for Persistent Memory in AI Coding Agents
Memctl is an open source MCP server that provides AI coding agents with persistent memory across sessions, machines, and IDEs. Built primarily with Claude Code in two weeks, it stores project context and serves it back in subsequent sessions.