Qwen 3.6 27B F16 Passes Pacman Coding Test, But 8-Bit Quants Fail — Key Lessons on Templates and MTP Speculative Decoding

A developer on r/LocalLLaMA shared a practical coding benchmark: one-shot a single-page Pacman clone from a good prompt, three attempts, keep the best. Qwen 3.6 27B F16 produced two nearly perfect games — the first local model to succeed. However, dropping to 8-bit quantization made good results unreproducible even after five attempts, reinforcing the claim that 8-bit quant is not lossless for complex generative tasks.
Key technical findings from the post:
- Chat template is critical: The official Qwen chat template is tuned for vLLM and contains errors in llama.cpp and other runners. The author fixed bugs iteratively, and after fine-tuning, the model felt "a new level of intelligence."
- MTP speculative decoding speeds vary by task: For deterministic tasks like coding, generative tok/s ranged from 8 to 18 tok/s (baseline without MTP: 6.6 tok/s). Creative tasks see less acceleration.
- Harness choice affects speed more than code quality: Qwen CLI performed surprisingly well — comparable to Claude Code in output quality, but much faster because Claude Code's extra prompts slow down local models. With a slow model like Qwen 3.6 27B at ~6 tok/s, every extra prompt adds painful latency.
- Don't interfere with context management: The model's native context caching and compaction work well. Plugins or tools that manipulate cache or context confuse the model and degrade performance.
- Tool calls and subagents work flawlessly after proper chat template fixes. Context compaction, shell usage, and parallel subagents all function as expected.
The author warns that your mileage depends heavily on runner configuration: use F16 weights, a corrected chat template, and avoid heavy harnesses unless you have fast inference. The full playable Pacman result is available at guigand.com/pacman.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenJet v0.4: Zero-Config Local Coding Agent with llama.cpp Backend
OpenJet v0.4 is an open-source terminal coding agent for local LLMs that auto-detects hardware, configures llama.cpp, and provides a Claude Code-style workflow with no API keys.

DoomVLM: Open Source Tool for Testing Vision Language Models in Doom Deathmatches
DoomVLM is now open source as a single Jupyter notebook that lets you test vision language models playing Doom via OpenAI-compatible APIs. The tool supports deathmatch modes where up to 4 models can compete, with full configuration options for system prompts, tool descriptions, and sampling parameters.

iai-mcp: A local daemon for persistent OpenClaw memory across sessions
iai-mcp is an open-source daemon that captures all OpenClaw conversations, stores them in three memory tiers with local neural embeddings and AES-256 encryption, and feeds relevant context back on new sessions — verbatim recall >99%, retrieval <100ms, session-start cost <3k tokens.

Open-sourced library of 59 Claude skills covers full website lifecycle
A developer released 59 reusable Claude skills covering brand discovery, design, content, SEO, development, ops, and growth — stack-agnostic, with uniform structure and CI lint validation.