Weekly Multimodal AI Roundup: Holotron-12B, Nemotron Omni, GlyphPrinter, and More

Open Multimodal AI Developments
Here are the key open-source multimodal AI releases and projects from the past week, curated from r/LocalLLaMA.
Holotron-12B
Holotron-12B is an open computer-use agent model available on Hugging Face. It's optimized for throughput and long multi-image contexts, serving as an open alternative for the computer-use agent ecosystem beyond closed APIs.
NVIDIA Nemotron Omni + Isaac GR00T N1.7
NVIDIA released open Nemotron 3 omni models that integrate language, vision, and voice in one stack. The GR00T N1.7 is a vision-language-action model specifically designed for robotics applications.
GlyphPrinter
GlyphPrinter addresses text rendering accuracy in AI image generators using Region-Grouped Direct Preference Optimization. It balances artistic styling with accurate text rendering and provides open weights. The approach fixes localized spelling errors in generated images.
SparkVSR
Google's video super-resolution model enhances video quality and clarity. This project focuses on improving video resolution through AI processing.
SegviGen
SegviGen enables 3D object segmentation via colorization by repurposing 3D image generators. The method frames segmentation as a colorization task and reportedly uses less than 1% of the training data required by older methods. The project includes open code and a demo.
OpenMAIC
OpenMAIC (Multi-Agent Interactive Classroom) turns any topic or document into an interactive classroom with AI teachers and classmates. It uses multi-agent orchestration to generate slides, quizzes, simulations, and discussions.
SkillNet
SkillNet provides open infrastructure for creating, evaluating, and organizing AI agent skills at scale. The system enables agents to transition from transient experience to durable mastery.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw 2026.3.13 regression causes false unreachable status reports
OpenClaw version 2026.3.13 introduced a diagnostic regression where status commands falsely report unreachable gateways despite RPC probes working correctly. Rolling back to 2026.3.12 resolves the issue.

Merlin Research releases Qwen3.5-4B-Safety-Thinking model for structured reasoning
Merlin Research has released Qwen3.5-4B-Safety-Thinking, a 4 billion parameter safety-aligned reasoning model built on Qwen3.5. The model is designed for structured 'thinking' and safety in real-world scenarios including agent systems.

Google Chrome Silently Downloads 4GB Gemini Nano Model Without Consent
Chrome automatically downloads a 4GB Gemini Nano model (weights.bin) to user devices without consent or opt-out, and re-downloads it if deleted. This raises legal (ePrivacy/GDPR) and environmental concerns at Chrome's billion-device scale.

MLX Inference Performance Update: April 2026 Benchmarks and Features
MLX inference performance has improved significantly, with Qwen3.5-35B-A3B reaching 71.8 tokens/second at 4K context and new features like Multi-Token Prediction and SpecPrefill providing 2.3x-5.5x speedups for large models.