Qwen 3.6 27B Quantization Benchmark: Q4_K_M Beats Q8_0 on Practical Tradeoffs

A Reddit user benchmarked Qwen 3.6 27B in three GGUF quantization variants (BF16, Q4_K_M, Q8_0) using llama-cpp-python via the Neo AI Engineer framework. The evaluation covered 664 total samples across three tasks: HumanEval (code generation, 164 samples), HellaSwag (commonsense reasoning, 100 samples), and BFCL (function calling, 400 samples).
Benchmark Results
- BF16 (model size 53.8 GB, peak RAM 54 GB, throughput 15.5 tok/s): HumanEval 56.10% (92/164), HellaSwag 90.00% (90/100), BFCL 63.25% (253/400). Average accuracy: 69.78%.
- Q4_K_M (16.8 GB, 28 GB RAM, 22.5 tok/s): HumanEval 50.61% (83/164), HellaSwag 86.00% (86/100), BFCL 63.00% (252/400). Average: 66.54%.
- Q8_0 (28.6 GB, 42 GB RAM, 18.0 tok/s): HumanEval 52.44% (86/164), HellaSwag 83.00% (83/100), BFCL 63.00% (252/400). Average: 66.15%.
Key Takeaways
Q4_K_M is the standout practical variant. It preserves BFCL accuracy (63.00% vs 63.25%), drops only ~5.5 points on HumanEval, and is ~4 points behind BF16 on HellaSwag. The tradeoffs: 1.45x faster than BF16, 48% less peak RAM, 68.8% smaller file, and nearly identical function calling performance. Q8_0 was underwhelming: it improved HumanEval by only ~1.8 points over Q4_K_M but used 42 GB RAM vs 28 GB, was slower, and scored lower on HellaSwag.
For local/CPU deployment, Q4_K_M is recommended unless the workload is heavily code-generation focused. For maximum quality, BF16 still wins.
Evaluation Setup
GGUF variants via llama-cpp-python with n_ctx: 32768, checkpointed evaluation. The Neo AI Engineer framework built the GGUF eval pipeline, handled checkpointed runs, and consolidated results. Complete case study with code snippets is linked in the original Reddit comments.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Agentlint: GitHub App that catches CLAUDE.md contradictions and broken pointers on every PR
Agentlint is a GitHub App that audits your full agent-rules surface (CLAUDE.md, AGENTS.md, skills, hooks) on every PR, posting inline comments for contradictions, broken paths, and unsupported harness features. Free for public repos.

Solo Dev Uses Claude + Blender MCP to Create App Store Video in 90 Minutes
Reddit user Positive_Camel2086 details how they used Claude with the Blender MCP server to generate a 10-second vertical launch video, automating camera rigging, materials, fog, and particle systems via conversational prompts.

Lumyr: Dashboard Generation via Claude with Python and Streamlit Automation
Lumyr is a tool that generates live, shareable dashboards from plain English descriptions using Claude for dashboard generation and automating the Python and Streamlit layer. Users don't need to write Python, open Streamlit, deploy, set up hosting, or manage infrastructure.

Unlocking Proactivity: A Deep Dive into Clawbot Innovations from the Community
Discover how enthusiasts are enhancing their Clawbot's proactivity through inventive strategies and community-driven insights. A look at discussions and revelations from r/openclaw.