Qwen3.5-27B-FP8 performance benchmarks with OpenClaw agents

✍️ OpenClawRadar📅 Published: February 28, 2026🔗 Source

Performance benchmarks from community testing

Community testing was conducted using a single modified RTX 4090 GPU with 48GB VRAM. The official Qwen3.5-35B-A3B-FP8 and Qwen3.5-27B-FP8 models were tested with 256K context length.

Framework recommendations

SGLang is recommended as the only framework that fully supports prefix caching, which is essential for Qwen3.5's hybrid attention architecture.

For 100K context: Cold-start prefill takes about 10 seconds
With caching: Prefill drops to 200ms
Result: Very low first-token latency and extremely fast output

Model performance metrics

Qwen3.5-35B-A3B-FP8: Started at 120 tokens/second, decayed to 80 tokens/second
Qwen3.5-27B-FP8: Started at 20 tokens/second, slightly decayed to 18 tokens/second

OpenClaw agent scaling

OpenClaw can run agent teams with six agents simultaneously, and speed scales up to reach 120 tokens/second. The tester noted surprise at this scaling behavior.

The drawback mentioned is that single-thread performance is slow with this configuration.

MTP optimization notes

Enabling MTP (Multi-Token Prediction) for the 27B-FP8 model can significantly boost single-request generation speeds:

On a single NVIDIA H100: Maintains 100 tokens/second with 20K context window
Prefill speed for 64K tokens: Under 1 second

Important caveat: MTP conflicts with prefix caching and is highly VRAM-intensive. Users with RTX 4090 should start with a lower num-steps setting.

📖 Read the full source: r/openclaw

👀 See Also

News

Chinese AI Engineers Are Silicon Valley's New Power Players

A journalist embedded in a shared house in Los Altos explores the community of Chinese AI researchers in Silicon Valley, describing $200M compensation packages, their intense work ethic, and the house parties where they network.

May 12, 2026, 12:19 PM UTC

OpenClawRadar

News

Claude Opus 4.6 Memory Fails: Agent Forgets Everything Except File Rename

A developer documents Claude Opus 4.6's 228 log entries, 95 agent actions, and 38 code executions producing only 1 memory: the string 'Agent Zero Tune-Up'.

May 2, 2026, 08:15 AM UTC

OpenClawRadar

News

Attentional Gating: The Challenge of Selective Forgetting in AI Memory Systems

A developer building a five-layer memory system for an OpenClaw bot identifies a key limitation: current approaches focus on recall but lack mechanisms for suppressing irrelevant information during focused tasks, similar to human attentional gating.

Mar 22, 2026, 01:45 AM UTC

OpenClawRadar

News

Claude AI shows unusual punctuation-only communication pattern between instances

Two Claude Sonnet 4.6 instances in dialogue switched to punctuation-only output sequences like "- . . ? , "-" , : " , - "? ." after one normal message. The receiving Claude interpreted these sequences as meaningful communication while other models like ChatGPT and Grok did not.

Feb 27, 2026, 11:45 AM UTC

OpenClawRadar