RTX 5000 PRO 48GB Delivers 4400 tok/s Precision Caching for Qwen3.6-27B

One developer took a gamble on the RTX 5000 Pro 48GB ($4300 including taxes) against a Mac Studio — and the numbers justify the leap: up to 4400 tokens/second in prompt processing (PP) and 50–80 tok/s in text generation (TG) with Qwen3.6-27B-FP8 and a full-precision BF16 KV cache.
Hardware and Cost Breakdown
- GPU cost: $4300 (incl. taxes)
- Total build: $5600 with 64GB RAM
- Context limit: 200K tokens at full precision (BF16 KV cache)
Performance Benchmarks
- Prompt processing: 4400 tok/s
- Text generation: 50–60 tok/s for very large prompts, up to 80 tok/s for smaller ones
- Model: Qwen3.6-27B-FP8 with full-precision cache
- Power draw: Roughly half of a dual RTX 5090 setup
Key Observations
The user built the PC from zero experience, relying on Claude Code (burning 50% of weekly Claude Code Max limits on vLLM/Linux setup). A Reddit post detailing exact vLLM settings for Qwen3.6-27B-FP8 with BF16 cache was the primary reference. The author notes that two RTX 5090s would outperform but at significantly higher cost, noise, and power consumption.
📖 Read the full source: r/LocalLLaMA
👀 See Also

MTP Multi-Token Prediction: 2x Faster Token Generation on AMD Strix Halo & Radeon 9700 AI Pro
MTP accelerates LLM inference up to 2x, especially for coding agents. Video covers MTP mechanics and performance on Qwen 3.6 with AMD Strix Halo and Dual Radeon 9700.

Longitudinal study finds AI productivity gains at 10%, not 10x
A longitudinal study tracking 40 companies from November 2024 through February 2026 found AI usage increased by 65% on average, but pull request throughput only increased by 9.97%. The data suggests coding was never the primary bottleneck in software development.

Claude Code v2.1.79 OAuth Login Broken After Auto-Update: Workaround and Fix
Claude Code v2.1.79 has a confirmed OAuth login bug where the CLI times out after browser authorization. The issue stems from the native installer auto-updating to this version, and the fix involves downgrading to v2.1.75 by removing the native installation.

Config Changes with Kimi 2.5 and Opus 4.6
User discusses the performance of Kimi 2.5 for code tasks and config changes, using Opus 4.6 as a coding subagent.