Qwen 3.5 122B MoE at 35 t/s on a Single 3090 with ik_llama.cpp MTP

✍️ OpenClawRadar📅 Published: June 6, 2026🔗 Source

A developer running a fully local inference stack on a single desktop reports hitting 35 tokens/s on Qwen 3.5 122B MoE using only one 3090, with the key enabler being a fork of llama.cpp that fixes MTP (Multi-Token Prediction) for offloaded experts.

Hardware Config

AMD 9900X CPU
192GB DDR5-5200 RAM (called “the secret weapon”)
Two 3090s (Ti + standard), no NVLink

Card 1 runs the worker: Qwen3.5-122B-A10B using Unsloth IQ3_S MTP GGUF with 204K context. 75% of expert layers are offloaded to CPU via surgical -ot flags. Card 2 runs the reasoner: Qwen3.6-35B-A3B Q4_K_XL with MTP at 135 t/s, 262K context.

Additional CPU-only instances handle background processing: Dialectic (35B heretic Q8), Scribe-Logos (Gemma4 19B), Moonshot (Gemma4 2B) — totalling ~19GB RAM.

The ik_llama.cpp Finding

Stock llama.cpp’s MTP evaluates each speculated token’s experts sequentially through DDR5, which on reasoning content actually regresses performance — the draft overhead outweighs the acceptance speedup. The ik fork implements fused MoE ops that batch expert reads for speculated tokens, turning MTP from a +4% gain into a +20% gain. The developer reports 35 t/s decode on a 122B model from a single 3090 using this fork.

If you’re offloading experts to RAM on any MoE model, try ik_llama.cpp before giving up on MTP.

Total Build Cost

~$1600 for RAM
~$1600 for two 3090s
~$400 for everything else
Running cost: electricity only

📖 Read the full source: r/openclaw

👀 See Also

Guides

OpenClaw v2026.3.22 Update Issues and 30-Second Fixes

The OpenClaw v2026.3.22 update introduced 12 breaking changes, including ClawHub becoming the default plugin store and deprecated environment variables. Five common disasters with quick fixes include API billing spikes, unintended agent actions, and configuration errors.

Mar 26, 2026, 05:45 PM UTC

OpenClawRadar

Guides

OpenClaw setup for human-in-the-loop browser automation with Docker, Chromium, and noVNC

A developer shares their Docker container setup that enables OpenClaw to handle CAPTCHAs and approvals mid-run by using Chromium with noVNC for remote access, requiring ~300MB RAM and 3-second cold starts.

Feb 25, 2026, 07:45 AM UTC

OpenClawRadar

Guides

Fixing 'Navigate Unsupported' and Browser Plugin Errors in Self-Hosted OpenClaw on Docker

Step-by-step fix for EACCES permission errors, missing Playwright, and Chromium binaries when self-hosting OpenClaw with Docker on a VPS like Hostinger.

May 8, 2026, 02:15 PM UTC

OpenClawRadar

Guides

Fix for sub-agents not showing up in OpenClaw v2026.3.13

A workaround for OpenClaw v2026.3.13 where custom sub-agents don't appear in the agent list: simplify the openclaw.json agent list to only include IDs and manually register agents in runs.json with status set to 'idle'.

Mar 16, 2026, 06:45 AM UTC

OpenClawRadar