Qwen3.5-27B-FP8 performance benchmarks with OpenClaw agents

Performance benchmarks from community testing
Community testing was conducted using a single modified RTX 4090 GPU with 48GB VRAM. The official Qwen3.5-35B-A3B-FP8 and Qwen3.5-27B-FP8 models were tested with 256K context length.
Framework recommendations
SGLang is recommended as the only framework that fully supports prefix caching, which is essential for Qwen3.5's hybrid attention architecture.
- For 100K context: Cold-start prefill takes about 10 seconds
- With caching: Prefill drops to 200ms
- Result: Very low first-token latency and extremely fast output
Model performance metrics
- Qwen3.5-35B-A3B-FP8: Started at 120 tokens/second, decayed to 80 tokens/second
- Qwen3.5-27B-FP8: Started at 20 tokens/second, slightly decayed to 18 tokens/second
OpenClaw agent scaling
OpenClaw can run agent teams with six agents simultaneously, and speed scales up to reach 120 tokens/second. The tester noted surprise at this scaling behavior.
The drawback mentioned is that single-thread performance is slow with this configuration.
MTP optimization notes
Enabling MTP (Multi-Token Prediction) for the 27B-FP8 model can significantly boost single-request generation speeds:
- On a single NVIDIA H100: Maintains 100 tokens/second with 20K context window
- Prefill speed for 64K tokens: Under 1 second
Important caveat: MTP conflicts with prefix caching and is highly VRAM-intensive. Users with RTX 4090 should start with a lower num-steps setting.
📖 Read the full source: r/openclaw
👀 See Also

Six Research-Backed Parallels Between LLM Failure Modes and ADHD Cognition
A developer with ADHD identifies six parallels between LLM failure patterns and ADHD cognitive science, backed by independent research on associative processing, confabulation, working memory limitations, pattern completion, structure dependence, and thread continuity.

Codex Converses: OpenClaw's Successor in AI Automation
Codex can now communicate with itself, heralding a new era in AI-driven automation and effectively replacing OpenClaw, the previous frontrunner.

OpenClaw Ecosystem Growth and Key Players Mapped
A community member has mapped the OpenClaw ecosystem's rapid expansion, noting 230K+ GitHub stars, 116K+ Discord members, and emerging companies in managed hosting, LLM routing, and security layers within 60 days of launch.

Claude Code v2.1.85 Release: MCP Improvements, Hook Filters, and Bug Fixes
Claude Code v2.1.85 adds environment variables for MCP headersHelper scripts, conditional if fields for hooks to reduce process spawning, and fixes for /compact failures, plugin enable/disable issues, and terminal keyboard problems in Ghostty, Kitty, and WezTerm.