Reddit user reports 18.8 tok/s CPU inference with Qwen 3 30B Q4 on Zen 4

A Reddit user shared their experience testing local LLM inference on CPU instead of investing in expensive GPU hardware.
Key Details
The user was considering purchasing GPU hardware for local LLM inference, including:
- P40 GPUs
- V100 GPUs (almost bought an SXM2 version that doesn't plug into normal motherboards)
- RTX 3090s (priced at $800+ due to AI demand)
After being advised to try CPU inference first, they tested:
- Model: Qwen 3 30B Q4
- Hardware: Zen 4 processor with DDR5 memory
- Performance: 18.8 tokens per second on CPU
- Expectation vs Reality: Expected 3-5 tok/s, got nearly 19 tok/s
The user noted that "Zen 4 + DDR5 is cracked for inference."
Practical Testing Results
The user conducted a real coding task comparison:
- An 8B model "confidently wrote completely wrong code"
- The 30B model "nailed it first try"
- They described the 30B model's performance as "basically GPT-4o level for $0"
This suggests that for certain coding tasks, a properly quantized 30B model running on modern CPU hardware can provide results comparable to larger cloud-based models without the hardware investment typically associated with local LLM inference.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Exploring What Files are Included in the Context Window of a Telegram Chat
Join us as we delve into understanding what files are part of a Telegram chat's context window, enhancing your operational knowledge.

RTX 4090 vs H100 for Fine-Tuning Llama-3-8B: A Cost-Performance Comparison
A developer tested fine-tuning Llama-3-8B on both an RTX 4090 and rented H100 instances. The 4090 setup cost $2,000 upfront and took 24 hours, while H100 rental cost about $80 and completed in 4 hours.

OpenRouter Confirms Hunter/Healer Alpha Models as MiMo V2 Variants
OpenRouter's previously stealth Hunter Alpha and Healer Alpha models have been confirmed as MiMo V2 variants. Hunter Alpha is the MiMo V2 Pro text-only reasoning model with 1M context window, while Healer Alpha is the MiMo V2 Omni text+image reasoning model with 262K context window.

MCP vs Skills Debate: Understanding the Roles and the Real Problem of Context Rot
A Reddit post clarifies that MCP provides tools, authentication, and context steering for AI agents, while Skills are reusable prompts that define agent behavior. The author argues both are needed and identifies context rot as a critical issue where agents forget instructions.