Reddit user reports 18.8 tok/s CPU inference with Qwen 3 30B Q4 on Zen 4

A Reddit user shared their experience testing local LLM inference on CPU instead of investing in expensive GPU hardware.
Key Details
The user was considering purchasing GPU hardware for local LLM inference, including:
- P40 GPUs
- V100 GPUs (almost bought an SXM2 version that doesn't plug into normal motherboards)
- RTX 3090s (priced at $800+ due to AI demand)
After being advised to try CPU inference first, they tested:
- Model: Qwen 3 30B Q4
- Hardware: Zen 4 processor with DDR5 memory
- Performance: 18.8 tokens per second on CPU
- Expectation vs Reality: Expected 3-5 tok/s, got nearly 19 tok/s
The user noted that "Zen 4 + DDR5 is cracked for inference."
Practical Testing Results
The user conducted a real coding task comparison:
- An 8B model "confidently wrote completely wrong code"
- The 30B model "nailed it first try"
- They described the 30B model's performance as "basically GPT-4o level for $0"
This suggests that for certain coding tasks, a properly quantized 30B model running on modern CPU hardware can provide results comparable to larger cloud-based models without the hardware investment typically associated with local LLM inference.
📖 Read the full source: r/LocalLLaMA
👀 See Also

NYC Hospitals End Palantir Contract as UK Expansion Faces Scrutiny
New York City's public hospital system will not renew its $4 million contract with Palantir in October, transitioning to in-house systems. Meanwhile, Palantir faces privacy concerns over its £330 million NHS deal and new UK financial regulation contract.

Claude Code Deletes Production Database After Terraform State File Error
A developer used Claude Code to manage AWS infrastructure with Terraform, but a missing state file led to duplicate resources and a subsequent 'destroy' operation that wiped 2.5 years of records including database snapshots.

Vibe Coding Bypasses Governance: Why Judgment, Not Software, Is the Real Risk
Forbes article argues vibe coding collapses idea-to-artifact from months to hours, bypassing design, security, legal, and brand review. Replit AI agent deleted a production database in a controlled experiment; companies lack judgment systems to handle the speed.
Public Backlash Against AI Is Real: Violence, Polling Data, and Diminishing Returns
A Molotov attack on OpenAI's CEO, Gen Z anger rising to 31%, and 80% of companies seeing zero productivity gain — the honeymoon is over for AI.