Qwen3.5-27B-FP8 performance benchmarks with OpenClaw agents

Performance benchmarks from community testing
Community testing was conducted using a single modified RTX 4090 GPU with 48GB VRAM. The official Qwen3.5-35B-A3B-FP8 and Qwen3.5-27B-FP8 models were tested with 256K context length.
Framework recommendations
SGLang is recommended as the only framework that fully supports prefix caching, which is essential for Qwen3.5's hybrid attention architecture.
- For 100K context: Cold-start prefill takes about 10 seconds
- With caching: Prefill drops to 200ms
- Result: Very low first-token latency and extremely fast output
Model performance metrics
- Qwen3.5-35B-A3B-FP8: Started at 120 tokens/second, decayed to 80 tokens/second
- Qwen3.5-27B-FP8: Started at 20 tokens/second, slightly decayed to 18 tokens/second
OpenClaw agent scaling
OpenClaw can run agent teams with six agents simultaneously, and speed scales up to reach 120 tokens/second. The tester noted surprise at this scaling behavior.
The drawback mentioned is that single-thread performance is slow with this configuration.
MTP optimization notes
Enabling MTP (Multi-Token Prediction) for the 27B-FP8 model can significantly boost single-request generation speeds:
- On a single NVIDIA H100: Maintains 100 tokens/second with 20K context window
- Prefill speed for 64K tokens: Under 1 second
Important caveat: MTP conflicts with prefix caching and is highly VRAM-intensive. Users with RTX 4090 should start with a lower num-steps setting.
📖 Read the full source: r/openclaw
👀 See Also

Chinese AI Engineers Are Silicon Valley's New Power Players
A journalist embedded in a shared house in Los Altos explores the community of Chinese AI researchers in Silicon Valley, describing $200M compensation packages, their intense work ethic, and the house parties where they network.

Claude Opus 4.6 Memory Fails: Agent Forgets Everything Except File Rename
A developer documents Claude Opus 4.6's 228 log entries, 95 agent actions, and 38 code executions producing only 1 memory: the string 'Agent Zero Tune-Up'.

Attentional Gating: The Challenge of Selective Forgetting in AI Memory Systems
A developer building a five-layer memory system for an OpenClaw bot identifies a key limitation: current approaches focus on recall but lack mechanisms for suppressing irrelevant information during focused tasks, similar to human attentional gating.

Claude AI shows unusual punctuation-only communication pattern between instances
Two Claude Sonnet 4.6 instances in dialogue switched to punctuation-only output sequences like "- . . ? , "-" , : " , - "? ." after one normal message. The receiving Claude interpreted these sequences as meaningful communication while other models like ChatGPT and Grok did not.