Local vs Cloud Models: Qwen-3.6-27B, Gemma-4-31B, Claude Haiku, Codex-Spark on Hard Code Gen

A Reddit user compared locally-ran Qwen-3.6-27B (GGUF q4_k_m) against API equivalents: Qwen-3.6-27B via OpenRouter, Gemma-4-31B via OpenRouter, Claude Haiku 4.5, and GPT-Codex-Spark. The test involved implementing an autoresearch loop from a design document — a deliberately hard task to evaluate failure cleanliness, not success rate.
Hardware Setup
- CPU: Ryzen 7 7800X3D
- RAM: 64 GB DDR5-6400
- GPU: RTX 5080 (16 GB VRAM)
- Local model: Qwen-3.6-27B q4_k_m (GGUF) — fits 16 GB VRAM via quantization
Results
- Gemma-4-31B (API): Failed completely. Wrote skeleton with mocked modules, no tests, no config files (
__init__.py,requirements.txt,pyproject.toml). Cost: $0.112, 803k context tokens consumed, 21k generated. - Codex-Spark (API): Produced beautiful folder structure and code, but imports were hallucinated. No unit tests. Used 1% of $100/mo Spark limits.
- Claude Haiku 4.5 (API): Detailed implementation but failed on correctness. (Further details truncated in source.)
- Qwen-3.6-27B (local q4_k_m): Not explicitly scored, but user notes quantized inference degrades quality vs full-precision API version.
Context
The user argues that typical local-model evals use trivial tasks (e.g., Snake in HTML) where both local and frontier models succeed, making local models look better than they are. This test used a real work project with a design document; only Codex-Spark produced fully written (but broken) code. The point: local models are not yet ready for complex code generation without substantial fixes.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code v2.1.169: Safe Mode, /cd Command, and Dozens of Bug Fixes
v2.1.169 adds --safe-mode to disable all customizations for troubleshooting, a /cd command to switch directories mid-session without cache loss, and fixes ~30-50ms UI stall, clipboard hangs on Windows, and enterprise MCP policy enforcement gaps.

Claude Cowork unifies slash commands and skills under single concept
Claude Cowork has unified slash commands and skills under a single concept called 'skills', eliminating separate headers in the / menu. Legacy commands continue to function as before.

Lovable offers $100 free Claude API credits for International Women's Day
Lovable is giving away $100 in Anthropic Claude API credits, $250 in Stripe fee credits, and 24-hour free access to their platform through March 8. Users need to claim the offer before 12:59 AM ET on March 9.

Tennessee Woman Jailed for Six Months Due to AI Facial Recognition Error
Angela Lipps, a 50-year-old Tennessee grandmother, spent nearly six months in jail after Fargo police used facial recognition software to incorrectly identify her as a suspect in a North Dakota bank fraud case. She was released on Christmas Eve after bank records proved she was 1,200 miles away at the time of the crimes.