Benchmark: Gemma4 12B vs Qwen3 8B quantized on 24GB Mac Mini

✍️ OpenClawRadar📅 Published: April 21, 2026🔗 Source
Benchmark: Gemma4 12B vs Qwen3 8B quantized on 24GB Mac Mini
Ad

Performance comparison of two local models for OpenClaw

A developer ran a head-to-head test comparing Gemma4 12B and Qwen3:8b-q4_K_M on a 24GB Mac Mini. The test used two prompts: "explain how a carburetor works" and "write a Python function to detect memory leaks." Claude helped write a command to grep the output for measurement.

Benchmark results

Carburetor explanation task:

  • Qwen3:8b-q4_K_M: Prompt eval: 89.8 t/s, Generation: 19.6 t/s
  • Gemma4: Prompt eval: 20.8 t/s, Generation: 27.6 t/s

Python coding task:

  • Qwen3:8b-q4_K_M: Prompt eval: 133.8 t/s, Generation: 18.7 t/s
  • Gemma4: Prompt eval: 26.1 t/s, Generation: 26.1 t/s
Ad

Key findings

Qwen3 processes prompts 4-5x faster than Gemma4, which matters for OpenClaw because of the large context prompts typically sent. Gemma4 generates output slightly faster. For many OpenClaw uses, Qwen3 wins on speed. The developer notes that Gemma4 is a 12B model and might produce slightly better output, though this wasn't tested.

The developer runs various tasks on local models including cron jobs, heartbeat monitoring, memory indexing, and often has OpenClaw call subagents running local models. They're testing Gemma4 as the local model for all these background tasks but don't expect to notice performance differences since these run in the background.

📖 Read the full source: r/openclaw

Ad

👀 See Also