Nemotron 3 4B Underperforms Qwen 3.5 4B in Demanding Benchmarks

✍️ OpenClawRadar📅 Published: March 19, 2026🔗 Source
Nemotron 3 4B Underperforms Qwen 3.5 4B in Demanding Benchmarks
Ad

Benchmark Results: Qwen 3.5 4B Outperforms Nemotron 3 4B

A detailed benchmark comparison between Qwen 3.5 4B Q8 and Nemotron 3 4B Q8 reveals significant performance differences in mathematical reasoning and structured output tasks.

Test Methodology

The benchmark consisted of five demanding sub-tasks requiring mathematical proofs, modular arithmetic, algorithm design, and multilingual text generation, all wrapped in valid JSON format. The exact prompt asked for:

  • Definition and evaluation of S(n) = Σ(-1)^k C(n,k)/(k+1)^2 with closed form in terms of H_{n+1}, evaluation at n=2026, and 8-line proof using integrals
  • Computation of T = Σ[floor((17k+8)/29) - floor((17k-4)/29)] from k=1 to 2026 with modular justification
  • Möbius + inclusion-exclusion algorithm for counting coprime pairs in a dynamic array with pseudocode in exactly 14 lines (variable names ≤8 characters)
  • Computation of C(4052, 2026) mod 7 using Lucas theorem with base-7 conversion
  • Portuguese paragraph of exactly 47 words containing "Möbius", "inclusão-exclusão" and "Lucas", ending with "fim."
Ad

Results Breakdown

Qwen 3.5 4B Q8 (correct): Produced valid JSON with all sub-tasks correctly completed. The mathematical proofs were accurate, including:

{
  "A": {
    "closed_form": "H_{n+1}/(n+1)",
    "value_2026": "H_{2027}/2027",
    "proof_8_lines": "S(n) = sum (-1)^k C(n,k) int_0^1 x^k(-ln x)dx\nSwap sum and integral: -int_0^1 ln(x)(1-x)^n dx\nLet u=1-x: S(n) = -int_0^1 u^n ln(1-u) du\nUsing identity int_0^1 x^n ln(1-x)dx = -H_{n+1}/(n+1)\nWe find S(n) = H_{n+1}/(n+1) for all n>=0\nFor n=2026, value is H_{2027}/2027 exactly.\nQED."
  },
  "B": {
    "value": 838,
    "justification": "Difference is 1 iff (17k-4) mod 29 >= 17. gcd(17,29)=1 so residues cycle every 29 terms. Per period, 12 values qualify. 2026 = 69*29 + 25. Full periods: 69*12=828. Remainder k=1..25: residues for k=26..29 include two >= 17, so remainder adds 10. Total: 838."
  }
}

Nemotron 3 4B Q8 (incorrect): Failed to produce correct reasoning and structured output. Issues included:

  • Mangled integral steps with wrong substitutions in the mathematical proof
  • Incomplete pseudocode with placeholder hash symbols instead of the required 14 lines
  • Truncated Portuguese paragraph ending mid-sentence
{
  "C": {
    "pseudocodigo": [
      "read n",
      "a = []",
      "mu = [0]*(max+1)",
      "for i in range(n): a.append(int(input()))",
      "for x in a:",
      " for d in range(1,int(sqrt(x))):",
      " if x%d==0",
      " mu[d]+=1",
      " mu[x//d]-=1",
      "#",
      "#",
      "#",
      "#",
      "#",
      "#",
      "#"
    ],
    "complexidade": "O(n√max)"
  }
}

Key Finding

The benchmark reveals that Nemotron 3 4B's architectural advantage—support for larger context windows—doesn't translate to better reasoning capability within that context. While Nemotron theoretically offers expanded context capacity, it failed to execute complex mathematical reasoning and structured output generation that Qwen 3.5 4B handled correctly.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also