RTX 5060 Ti 16GB Local LLM Benchmarks: 30B Models Still Lead for Coding

✍️ OpenClawRadar📅 Published: April 19, 2026🔗 Source
RTX 5060 Ti 16GB Local LLM Benchmarks: 30B Models Still Lead for Coding
Ad

RTX 5060 Ti 16GB Local LLM Performance Findings

Testing on an RTX 5060 Ti 16GB with 32GB DDR4 RAM using llama-server b8373 (46dba9fce) reveals practical performance characteristics for local LLM coding workflows. The setup used llama.cpp with specific launch settings: fast path with fa=on, ngl=auto, threads=8, and KV settings -ctk q8_0 -ctv q8_0.

Model Performance Results

The benchmark compared multiple quantized models with these key findings:

  • Best default coding model: Unsloth Qwen3-Coder-30B UD-Q3_K_XL
  • Best higher-context coding option: Same Unsloth 30B model at 96k context
  • Best fast 35B coding option: Unsloth Qwen3.5-35B UD-Q2_K_XL

Performance Metrics

Token generation speeds from local testing:

  • Jackrong Qwen 3.5 4B Q5_K_M: 88 tok/s
  • LuffyTheFox Qwen 3.5 9B Q4_K_M: 64 tok/s
  • Jackrong Qwen 3.5 27B Q3_K_S: ~20 tok/s
  • Unsloth Qwen 3.0 30B UD-Q3_K_XL: 76.3 tok/s
  • Unsloth Qwen 3.5 35B UD-Q2_K_XL: 80.1 tok/s

Cross-Platform Comparison

Matched tests with 20 questions, 32k context, and max_tokens=800 showed:

  • Unsloth Qwen3-Coder-30B UD-Q3_K_XL: Windows: 79.5 tok/s, quality 7.94 | Ubuntu: 76.3 tok/s, quality 8.14
  • Unsloth Qwen3.5-35B UD-Q2_K_XL: Windows: 72.3 tok/s, quality 7.40 | Ubuntu: 80.1 tok/s, quality 7.39
  • Jackrong Qwen3.5-27B Claude-Opus Distilled Q3_K_S: Windows: 19.9 tok/s, quality 8.85 | Ubuntu: ~20.0 tok/s, quality 8.21
Ad

Configuration Notes

The 30B coder path used: jinja, reasoning-budget 0, reasoning-format none. The 35B UD path used: c=262144, n-cpu-moe=8. For the 35B Q4_K_M stable tune, settings were: -ngl 26 -c 131072 --fit on --fit-ctx 131072 --fit-target 512M.

Notably, the 35B Q4_K_M model required specific tuning to run stable on this card but still didn't outperform the older UD-Q2_K_XL path in practical use. The author found that smaller models (9B route) and heavier experiments (35B Q4_K_M) weren't the strongest real-world picks despite expectations.

Ubuntu Performance Testing

Additional focused testing on Ubuntu with the Jackrong 27B model showed minimal variation:

  • -fa on, auto parallel: 19.95 tok/s
  • -fa auto, auto parallel: 19.56 tok/s
  • -fa on, --parallel 1: 19.26 tok/s

Flash-attention settings and parallel processing parameters had negligible impact on this particular model's performance.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also