Nemotron 3 4B Underperforms Qwen 3.5 4B in Demanding Benchmarks

Benchmark Results: Qwen 3.5 4B Outperforms Nemotron 3 4B
A detailed benchmark comparison between Qwen 3.5 4B Q8 and Nemotron 3 4B Q8 reveals significant performance differences in mathematical reasoning and structured output tasks.
Test Methodology
The benchmark consisted of five demanding sub-tasks requiring mathematical proofs, modular arithmetic, algorithm design, and multilingual text generation, all wrapped in valid JSON format. The exact prompt asked for:
- Definition and evaluation of S(n) = Σ(-1)^k C(n,k)/(k+1)^2 with closed form in terms of H_{n+1}, evaluation at n=2026, and 8-line proof using integrals
- Computation of T = Σ[floor((17k+8)/29) - floor((17k-4)/29)] from k=1 to 2026 with modular justification
- Möbius + inclusion-exclusion algorithm for counting coprime pairs in a dynamic array with pseudocode in exactly 14 lines (variable names ≤8 characters)
- Computation of C(4052, 2026) mod 7 using Lucas theorem with base-7 conversion
- Portuguese paragraph of exactly 47 words containing "Möbius", "inclusão-exclusão" and "Lucas", ending with "fim."
Results Breakdown
Qwen 3.5 4B Q8 (correct): Produced valid JSON with all sub-tasks correctly completed. The mathematical proofs were accurate, including:
{
"A": {
"closed_form": "H_{n+1}/(n+1)",
"value_2026": "H_{2027}/2027",
"proof_8_lines": "S(n) = sum (-1)^k C(n,k) int_0^1 x^k(-ln x)dx\nSwap sum and integral: -int_0^1 ln(x)(1-x)^n dx\nLet u=1-x: S(n) = -int_0^1 u^n ln(1-u) du\nUsing identity int_0^1 x^n ln(1-x)dx = -H_{n+1}/(n+1)\nWe find S(n) = H_{n+1}/(n+1) for all n>=0\nFor n=2026, value is H_{2027}/2027 exactly.\nQED."
},
"B": {
"value": 838,
"justification": "Difference is 1 iff (17k-4) mod 29 >= 17. gcd(17,29)=1 so residues cycle every 29 terms. Per period, 12 values qualify. 2026 = 69*29 + 25. Full periods: 69*12=828. Remainder k=1..25: residues for k=26..29 include two >= 17, so remainder adds 10. Total: 838."
}
}
Nemotron 3 4B Q8 (incorrect): Failed to produce correct reasoning and structured output. Issues included:
- Mangled integral steps with wrong substitutions in the mathematical proof
- Incomplete pseudocode with placeholder hash symbols instead of the required 14 lines
- Truncated Portuguese paragraph ending mid-sentence
{
"C": {
"pseudocodigo": [
"read n",
"a = []",
"mu = [0]*(max+1)",
"for i in range(n): a.append(int(input()))",
"for x in a:",
" for d in range(1,int(sqrt(x))):",
" if x%d==0",
" mu[d]+=1",
" mu[x//d]-=1",
"#",
"#",
"#",
"#",
"#",
"#",
"#"
],
"complexidade": "O(n√max)"
}
}
Key Finding
The benchmark reveals that Nemotron 3 4B's architectural advantage—support for larger context windows—doesn't translate to better reasoning capability within that context. While Nemotron theoretically offers expanded context capacity, it failed to execute complex mathematical reasoning and structured output generation that Qwen 3.5 4B handled correctly.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Trading Strategy Benchmark: Cheaper AI Models Outperform Claude Opus 4.6
A benchmark tested 10 LLMs on developing trading strategies, with cheaper models like Minimax 2.5 and Gemini 3.1 outperforming Claude Opus 4.6 despite its 10x higher cost. The experiment was run three times with consistent results.

Agent Infrastructure for SMB Operations: A White Paper from a QSR Operator-Turned-Builder
A 16-year QSR operator published a white paper arguing for a missing infrastructure layer between generic AI chat and vertical SaaS dashboards, with 8 skills on ClawHub, 1,500+ downloads, and one live deployment outside QSR.

Claude Pro User Reports 5-Hour Usage Window Burned on Single Prompt with No Output
A Claude Pro user reports that a single prompt consumed their entire 5-hour usage window, returning only planning text and no deliverable. The incident highlights issues with token consumption during internal reasoning and lack of safeguards.

AMD Ryzen AI NPUs Gain Linux LLM Support via Lemonade 10.0 and FastFlowLM
AMD Ryzen AI NPUs now support running large language models on Linux through Lemonade 10.0 server with FastFlowLM runtime, requiring Linux 7.0 kernel or AMDXDNA driver back-ports.