Developer Tests Qwen3.5 27B vs Larger Models for Local Coding Tasks

✍️ OpenClawRadar📅 Published: March 28, 2026🔗 Source
Developer Tests Qwen3.5 27B vs Larger Models for Local Coding Tasks
Ad

A developer tested several large language models for local coding tasks, comparing performance and hardware requirements. The testing focused on Qwen3.5 variants and Nemotron models, with comparisons to GPT-5.4 High.

Test Results and Findings

The developer tested these specific models:

  • unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL
  • unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL
  • unsloth/Qwen3.5-122B-A10B-GGUF
  • unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL
  • unsloth/Qwen3.5-27B-GGUF:UD-Q8_K_XL
  • unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:UD-IQ4_XS
  • unsloth/gpt-oss-120b-GGUF:F16

Key findings from the testing:

  • Nemotron-3-Super-120B performed "very, very good," on par with GPT-5.4 High
  • Qwen3.5-27B performed well for development tasks
  • GPT-OSS-120B and Qwen3.5-122B performed worse than the other two models
  • Nemotron-3-Super-120B consistently responded in Spanish (the tester's native language) while others responded in English

Performance Metrics

The developer provided specific performance numbers:

  • Nemotron-3-Super-120B: 80 tokens per second (tg/s), ~2000 prompt processing (pp), 100k context on vast.ai with 4x RTX 3090
  • Qwen3.5-27B Q6: 803 pp, 25 tg/s, 256k context on vast.ai
Ad

Hardware Requirements

The developer noted hardware constraints:

  • Qwen3.5-122B would require a new motherboard and 1-2 more RTX 3090 cards, making it too expensive
  • Qwen3.5-27B runs on existing 2x RTX 3090 hardware without additional investment
  • If they had the hardware for Nemotron-3-Super-120B, they would use it instead

Implementation Details

The developer plans to use Qwen3.5-27B-GGUF:UD-Q6_K_XL for real development tasks locally and provided the llama.cpp command used for testing:

./llama.cpp/llama-server -hf unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL --ctx-size 262144 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 -ngl 999

The developer mentioned they'll continue using CODEX for complex tasks but can replace API subscriptions for daily tasks with the local setup.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

9 Free Claude Code Skills for Medical Research Workflow
Tools

9 Free Claude Code Skills for Medical Research Workflow

A radiology researcher has open-sourced 9 Claude Code skills covering the medical research workflow from literature search to manuscript preparation. The skills include PubMed searching with anti-hallucination verification, statistical analysis code generation, and publication-ready figure creation.

OpenClawRadar
Keyoku Plugin Replaces OpenClaw's Static Heartbeat with Memory-Driven Autonomy
Tools

Keyoku Plugin Replaces OpenClaw's Static Heartbeat with Memory-Driven Autonomy

Keyoku is a free OpenClaw plugin that changes the agent's heartbeat from reading a static HEARTBEAT.md file to scanning the agent's actual memory store for stalled work, dropped commitments, conflicting information, and quiet relationships. It uses a local Go engine with SQLite + HNSW and offers three autonomy levels: observe, suggest, and act.

OpenClawRadar
StarSteady: AI-Powered Google Review Responses and SMS Requests for Local Businesses
Tools

StarSteady: AI-Powered Google Review Responses and SMS Requests for Local Businesses

StarSteady is a solo-built SaaS that generates AI-crafted responses to Google/Yelp reviews and sends SMS review requests to customers, starting at $39/month with a 5-response/5-SMS free trial.

OpenClawRadar
Agentlint: GitHub App that catches CLAUDE.md contradictions and broken pointers on every PR
Tools

Agentlint: GitHub App that catches CLAUDE.md contradictions and broken pointers on every PR

Agentlint is a GitHub App that audits your full agent-rules surface (CLAUDE.md, AGENTS.md, skills, hooks) on every PR, posting inline comments for contradictions, broken paths, and unsupported harness features. Free for public repos.

OpenClawRadar