NVIDIA Releases Nemotron-3-Ultra-550B: 55B Active Parameters, 1M Context, LatentMoE Hybrid

NVIDIA released Nemotron-3-Ultra-550B-A55B-BF16, a frontier-scale LLM with 550B total parameters and 55B active. The model uses a hybrid Latent Mixture-of-Experts (LatentMoE) architecture that interleaves Mamba-2, MoE, and attention layers, plus Multi-Token Prediction (MTP) for faster generation. Context length reaches up to 1M tokens.
Key Specs
- Architecture: LatentMoE hybrid – Mamba-2 + MoE + Attention + MTP
- Parameters: 550B total / 55B active
- Context: Up to 1M tokens
- Min GPU: 8x GB200/B200/GB300/B300, 16x H100, 8x H200
- Languages: English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Brazilian Portuguese, Chinese
- Reasoning: Configurable on/off via chat template (
enable_thinking=True/False) - License: OpenMDW License Agreement v1.1
The model is built for frontier reasoning, complex agentic workflows, long-context analysis, tool use, multilingual reasoning, and high-stakes RAG. It's trained with NVFP4 pre-training recipe for compute efficiency. Open weights, training data, and recipes are included under the OpenMDW license. For local inference, you'll need at least 8x H200 or equivalent.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Local LLM Benchmark: Backend Generation by Function Calling – GLM, Qwen, DeepSeek Compared
A rigorous benchmark of local and frontier LLMs for backend code generation via function calling, with scoring rubric. Key findings: qwen3.5-35b-a3b matches gpt-5.4 on DB/API design, and dense Qwen 27B beats 397B MoE. Frontier models dropped due to cost.

Analysis of 100M tokens in Claude Code reveals 99.4% input usage
Analysis of 1,289 requests across extended coding sessions shows Claude Code used 100.3M input tokens (99.4%) versus only 616K output tokens (0.6%), with 84.2M tokens cached due to repeated context re-sending.

OpenClaw v3.22 Update Causes Dashboard and WhatsApp Issues
OpenClaw v3.22 has broken dashboard functionality and WhatsApp integration, with two GitHub issues (#52808 and #52813) documenting the problems. Users are advised not to update to this version.

Anthropic Allows Subscription Usage for Claude via OpenClaw Starting June
Anthropic will allow subscription-based usage of Claude through OpenClaw starting in June, as announced by the OpenClaw Dev Twitter account.