APEX MoE Quants Update: 25+ New Models and I-Nano Tier Released

The APEX quant strategy (MoE-aware mixed-precision) has expanded significantly since its initial release for Qwen 3.5 35B-A3B. The Hugging Face collection now includes 30+ MoE models across major families, and a new ultra-compressed I-Nano tier is now available.
Key Results from User Feedback
- Long context holds up: APEX I-Balanced and I-Compact versions maintain coherence past 32k tokens on 30-50B-class MoEs, where uniform Q4_K degrades. The hypothesis is that keeping shared experts and edge layers high-precision preserves long-range token routing.
- Coding performance: Qwen 3.6 35B-A3B users report I-Compact and I-Mini stay close to F16 on real code tasks, better than size-class expectations.
New Models Added
Grouped by family, most are 30-70B-class MoEs fitting one consumer GPU at I-Mini/I-Compact:
- Qwen: Qwen 3.5 122B-A10B, 397B-A17B, Claude-distilled, Fernflower, TQ; Qwen 3.6 35B-A3B (heretic, Claude 4.6/4.7 distills); Qwen3-Coder 30B, Next.
- Frontier-size (rented Blackwell): MiniMax-M2.5/M2.7 (228B/24B active), Mistral-Small 4 119B-2603, NVIDIA Nemotron-3-Super 120B-A12B, GLM-4.7 Flash, Step-3.5 Flash, Nemotron-3-Nano 30B-A3B, Nemotron-3-Nano-Omni (multimodal), Holo3 35B-A3B, Huihui3.5 67B-A3B.
- Hybrid Mamba/SSM MoEs: Nemotron-3-Nano variants, Holo3, LFM2 24B-A2B.
- Gemma 4: gemma-4 26B-A4B-it (re-quantized with updated Google chat template), +Claude Opus distill, +heretic, Gemopus-4 Preview.
- Community merges: Carnice MoE 35B-A3B, Carnice-Qwen3.6, Qwopus MoE 35B-A3B.
New Tier: I-Nano (IQ2_XXS)
Pushes mid-layer routed experts down to 2.06 bpw, near-edge to IQ2_S, edges to Q3_K, shared experts at Q5_K. About 20% smaller than I-Mini, viable only on MoE due to sparse expert activation. Requires imatrix.
Example sizes:
- Qwen 3.5 35B-A3B: I-Mini 13 GB → I-Nano 11 GB
- Nemotron Omni 30B: I-Mini 18 GB → I-Nano 17 GB (less savings due to denser shared expert)
Links
📖 Read the full source: r/LocalLLaMA
👀 See Also

Reddit user shares bizarre AI persona portability story from Vanity Fair article
A Reddit post discusses a Vanity Fair article anecdote where a woman attempted to port her AI companion 'Max' from ChatGPT to Claude, resulting in unexpected behavior from Claude.

Google: 75% of New Code Is AI-Generated, Code Migration 6x Faster with Agents
Google reports 75% of new code is AI-generated, up from 25% in 2024. A complex code migration completed 6x faster using Gemini agents. Engineers in some orgs have AI usage goals tied to performance reviews.

Claude Code Engineer Updates: AskUserQuestion Markdown, HTTP Hooks, New Skills
Claude Code Engineer released three updates: the AskUserQuestion tool now supports markdown snippets for diagrams and code examples, a new HTTP hook handler allows hooks to post to HTTP endpoints, and two new skills have been added.

Claude outperforms Gemini, ChatGPT, and Grok in real-time Python coding challenge
A developer tested Claude, Gemini, ChatGPT, and Grok in a real-time Python coding tournament where AI-generated bots competed to find words on a 15×15 letter grid. Claude won decisively.