GGUF Model Merging Script and Workflow for Qwen3.5-35B Variants

✍️ OpenClawRadar📅 Published: April 1, 2026🔗 Source

A Reddit user has shared a Python script and workflow for merging GGUF model files with minimal loss, specifically targeting Qwen3.5-35B variants. The approach combines two existing models: HauhauCS's Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive and samuelcardillo's Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF.

Technical Details

The merged model is available as a Q4_0 quantized version at Hugging Face. According to the source, samuelcardillo's finetune outperforms Jackrong's version for Qwen 3.5 35B.

Merging Workflow

The Python script (available on Pastebin) was "vibecoded via Claude Opus 4.6" and supports:

Merging GGUF files on Google Colab Free Tier
Quantization via llama-quantize
Q4_K_M quantization for 35B models
Q8 quantization for 8B models

The author notes they can't create Q8_0 or F16 quantized versions due to disk space limitations on Google Colab Free tier, but suggests others can tweak the script via Claude Opus for those quantizations.

Optimal Settings

For best performance in LM Studio, use these parameters:

Temperature: 0.7
Top K Sampling: 20
Presence Penalty: 1.5
Top P Sampling: 0.8
Min P Sampling: 0
Seed: 3407 or 42

The system prompt (full version on Pastebin) should include this first line: "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." The author notes the model underperforms without this line.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Krasis LLM Runtime Shows 8.9x Prefill and 4.7x Decode Speed Improvements Over Llama.cpp

Krasis LLM runtime now runs both prefill and decode entirely on GPU with different optimization strategies, achieving 8.9x faster prefill and 4.7x faster decode than llama.cpp on Qwen3.5-122B with a single 5090 GPU.

Mar 17, 2026, 06:45 PM UTC

OpenClawRadar

Tools

Qwen 3.6 27B Quantization Benchmark: Q4_K_M Beats Q8_0 on Practical Tradeoffs

Evaluated Qwen 3.6 27B across BF16, Q4_K_M, and Q8_0 GGUF quants on HumanEval, HellaSwag, and BFCL. Q4_K_M delivers near-BF16 scores with 48% less RAM, 1.45x speed, and 68.8% smaller file size.

Apr 28, 2026, 04:17 PM UTC

OpenClawRadar

Tools

Claude-Control: Mobile Remote Control for Claude Code Sessions

Claude-control is an open-source tool that lets you manage Claude Code sessions from your phone via HTTPS and WebSocket. It runs Claude Code in a real PTY inside tmux, detects permission prompts, and sends push notifications with Allow/Deny buttons.

Apr 16, 2026, 06:45 AM UTC

OpenClawRadar

Tools

Open-source Specialist Dispatch adapter delegates complex tasks to Claude Code

expert-dispatch is a ~500-line bash script that lets a cheap AI assistant delegate complex coding tasks to Claude Code CLI. It uses commands like dispatch-cc run to send tasks and maintains per-project directories with CLAUDE.md for persistent context.

Apr 20, 2026, 08:23 PM UTC

OpenClawRadar