Qwen 3.6 27B with MTP on V100 32GB: 54 t/s via llama.cpp Branch

A user on r/LocalLLaMA reports impressive results running Qwen 3.6 27B with Multi-Token Prediction (MTP) on a V100 32GB SXM module using a PCIe adapter. The setup uses am17an's MTP branch of llama.cpp and the corresponding MTP GGUF quant. Key specs: Q8_0 KV cache with 200k cache limit, running as a VS Code Copilot backend via llama-server.
Performance Numbers
- Without MTP: 29-30 tokens/second
- With MTP: 54-55 tokens/second (at 150W power limit)
- After 50k tokens context: drops to 40-45 t/s
Branch: am17an's MTP fork. Build and run were straightforward — 'pulled and built in one shot' with llama-server running without issues. The setup handles tool calls and sub-agents well, and delivered 'very insightful code reviews and refactors' despite the VRAM limitation (32GB).
This is particularly relevant for developers running LLMs on older datacenter hardware like V100s. MTP effectively doubles throughput for this model, demonstrating practical gains for coding assistant workloads.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Rival-Review: A Cross-Model Review Loop for AI Agent Plans
Rival-review is an MIT-licensed tool that uses a second AI model to audit plans from a primary AI coding agent before execution, catching issues like flawed rollback plans, security holes, and stale-state decisions.

Docent: An AI Assistant for Paper Analysis Built with Claude Code
A developer created Docent, an AI assistant that reads uploaded papers, presents them, answers questions, and assesses understanding using Claude Code. The project is available on GitHub under MIT License with a demo on Vercel.

apple-music-play OpenClaw skill published on ClawHub for Apple Music search and playback
The apple-music-play skill published on ClawHub enables searching Apple Music's online catalog and playing tracks directly in the macOS Music app, without requiring songs to be in your local library.

Fehu: CLI Double-Entry Bookkeeping with Claude AI MCP Integration
Fehu is a lightweight CLI personal accounting tool that connects to Claude AI via MCP, allowing natural language transaction recording with a SQLite-backed double-entry system. It features hierarchical accounts, auto-tagging with hashtags, a powerful calc engine, and multi-currency support.