Reddit user reports 18.8 tok/s CPU inference with Qwen 3 30B Q4 on Zen 4

✍️ OpenClawRadar📅 Published: April 15, 2026🔗 Source
Reddit user reports 18.8 tok/s CPU inference with Qwen 3 30B Q4 on Zen 4
Ad

A Reddit user shared their experience testing local LLM inference on CPU instead of investing in expensive GPU hardware.

Key Details

The user was considering purchasing GPU hardware for local LLM inference, including:

  • P40 GPUs
  • V100 GPUs (almost bought an SXM2 version that doesn't plug into normal motherboards)
  • RTX 3090s (priced at $800+ due to AI demand)

After being advised to try CPU inference first, they tested:

  • Model: Qwen 3 30B Q4
  • Hardware: Zen 4 processor with DDR5 memory
  • Performance: 18.8 tokens per second on CPU
  • Expectation vs Reality: Expected 3-5 tok/s, got nearly 19 tok/s

The user noted that "Zen 4 + DDR5 is cracked for inference."

Ad

Practical Testing Results

The user conducted a real coding task comparison:

  • An 8B model "confidently wrote completely wrong code"
  • The 30B model "nailed it first try"
  • They described the 30B model's performance as "basically GPT-4o level for $0"

This suggests that for certain coding tasks, a properly quantized 30B model running on modern CPU hardware can provide results comparable to larger cloud-based models without the hardware investment typically associated with local LLM inference.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also