Qwen 3.6 27B F16 Passes Pacman Test, 8-Bit Quants Fail

A developer on r/LocalLLaMA shared a practical coding benchmark: one-shot a single-page Pacman clone from a good prompt, three attempts, keep the best. Qwen 3.6 27B F16 produced two nearly perfect games — the first local model to succeed. However, dropping to 8-bit quantization made good results unreproducible even after five attempts, reinforcing the claim that 8-bit quant is not lossless for complex generative tasks.

Key technical findings from the post:

Chat template is critical: The official Qwen chat template is tuned for vLLM and contains errors in llama.cpp and other runners. The author fixed bugs iteratively, and after fine-tuning, the model felt "a new level of intelligence."
MTP speculative decoding speeds vary by task: For deterministic tasks like coding, generative tok/s ranged from 8 to 18 tok/s (baseline without MTP: 6.6 tok/s). Creative tasks see less acceleration.
Harness choice affects speed more than code quality: Qwen CLI performed surprisingly well — comparable to Claude Code in output quality, but much faster because Claude Code's extra prompts slow down local models. With a slow model like Qwen 3.6 27B at ~6 tok/s, every extra prompt adds painful latency.
Don't interfere with context management: The model's native context caching and compaction work well. Plugins or tools that manipulate cache or context confuse the model and degrade performance.
Tool calls and subagents work flawlessly after proper chat template fixes. Context compaction, shell usage, and parallel subagents all function as expected.

The author warns that your mileage depends heavily on runner configuration: use F16 weights, a corrected chat template, and avoid heavy harnesses unless you have fast inference. The full playable Pacman result is available at guigand.com/pacman.

📖 Read the full source: r/LocalLLaMA

Qwen 3.6 27B F16 Passes Pacman Coding Test, But 8-Bit Quants Fail — Key Lessons on Templates and MTP Speculative Decoding

👀 See Also

Developer shares hybrid AI coding workflow: Claude for planning, local models for execution

OpenClaw Shared Memory Plugin: SQLite-Based Multi-Agent Coordination

Running Claude Code Offline on an M3 Pro with Qwen3.6: 4 Fixes That Made It Work

Voxlert: Voice Notifications for Claude Code Sessions with Character Voices