mistral.rs Adds Support for Gemma 4 12B: Multimodal, Agentic, and MTP

mistral.rs now supports Gemma 4 12B with multimodal, agentic, and Multi-Turn Prediction (MTP) features. This release includes web search and sandboxed code execution for building agentic apps, plus audio, image, and video input.
Installation
Single-line install for Linux/macOS and Windows:
# Linux/macOS
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.sh | sh
Windows
irm https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.ps1 | iex
Running with Agent & Quantization
Launch an OpenAI- and Anthropic-compatible HTTP server with a built-in web UI at localhost:1234/ui:
mistralrs run --agent -m google/gemma-4-12B-it --quant 4Enabling MTP (Multi-Turn Prediction)
To use MTP, add the --mtp-model flag with the assistant model:
mistralrs run --agent -m google/gemma-4-12B-it --quant 4 --mtp-model google/gemma-4-12B-it-assistantKey Features
- Full multimodal support: audio, image, and video
- Web search and sandboxed code execution for agentic workflows
- OpenAI and Anthropic-compatible HTTP server
- Built-in web chat UI at
localhost:1234/ui
For more details: GitHub | Documentation
📖 Read the full source: r/LocalLLaMA
👀 See Also

MarkView: Open-source tool renders and manages AI-generated Markdown files
MarkView is a private-first rendering engine that displays Markdown files with Mermaid diagrams and KaTeX math, available as a web app, native macOS app, and MCP server for Claude Desktop and Cursor integration.

Engramx v3.4: MCP Server + SQLite Knowledge Graph Cuts Claude Code Token Usage by 89%
Engramx v3.4 intercepts file reads for Claude Code agents, returning structural summaries instead of raw content. Benchmarks show 89.1% aggregate token reduction across an 87-file codebase.

How AI assistants fetch web pages: Nginx log analysis of ChatGPT, Claude, Gemini and others
A developer tested five major AI assistants by prompting them with unique URLs and monitoring Nginx logs, revealing distinct retrieval patterns: ChatGPT, Claude, and Perplexity use dedicated user-agents while Gemini answered from its index without fetching.

ToolLoop: Open-Source Framework for Claude-Style Tools with Any LLM
ToolLoop is an open-source Python framework with 11 tools for file operations, code search, shell access, and sub-agents that works with any LLM through LiteLLM. The 2,700-line framework allows switching models mid-conversation while maintaining shared context.