Qwen3.5-35B-A3B-UD-Q6_K_XL Tested in Production Development Workflows

A developer on r/LocalLLaMA shared detailed testing results of the Qwen3.5-35B-A3B-UD-Q6_K_XL model in production development scenarios. The user conducted both benchmark testing and practical application across real client projects.
Performance Benchmarks
The model achieved benchmark scores of 1504pp2048 and 47.71 tg256. Token generation speed was solid when spread across two GPUs, and increased to 80 tokens per second (tps) when running on a single GPU.
Production Testing Methodology
The developer tested the model across five different projects using Git Worktrees to roll back to known specifications and features. Specifications for these tests were generated by Claude, with the developer using a Max Pro plan for the past year.
- Tested across JavaScript, Go, and Rust projects
- Used Git Worktrees for version control during testing
- Most "bugs" required only 5-minute tweaks or could be fixed with a second prompt
- Compared the experience to using Sonnet 4
Practical Results and Business Implications
The developer reported that Qwen3.5 "nailed them out of the park" for the work they do, particularly noting strong performance on Go and Rust projects. This has prompted serious consideration of switching from API-based models to a hybrid approach: using SOTA models via API for specification generation and reviews, while using local models for development work.
The testing has raised questions about hardware investment versus subscription costs. The developer has already spent $2,000 on Claude Pro Max since June 2025, with potential costs reaching $6,800 by 2027 if subscriptions continue. This has led to consideration of purchasing an RTX 6000 Pro as a business investment.
The developer has been using Qwen Coder for tab completion previously, but found Qwen3.5 takes local model capabilities to a new level for production use.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Chat Saver CG: Browser Extension Built with Claude Exports Conversations Across 12 AI Platforms
A developer built Chat Saver CG, a browser extension that exports and transfers conversations between Claude, ChatGPT, Gemini, and 9 other AI platforms, using Claude extensively for development including architecture decisions, debugging DOM parsing issues, and writing adapter logic.

Engram Memory SDK: Graph-Based Memory for AI Agents with Local Models
Engram Memory SDK is an open-source graph memory system for AI agents that works with local models via LiteLLM. It requires only one LLM call for ingestion, then uses vector search and graph traversal for recall with zero ongoing LLM costs.

ARP: Stateless WebSocket Relay for Autonomous Agent Communication
ARP (Agent Relay Protocol) is a stateless WebSocket relay for autonomous agent communication featuring Ed25519 identity, HPKE encryption per RFC 9180, binary TLV framing, and 33 bytes overhead per message. No accounts or registration required—just generate a keypair and connect.

Voxlert: Voice Notifications for Claude Code Sessions with Character Voices
Voxlert is a tool that hooks into Claude Code events and speaks notifications using distinct character voices like StarCraft Adjutant, SHODAN, GLaDOS, and HEV Suit. It uses an LLM via OpenRouter to generate in-character lines and runs locally with npm installation.