NexQuant: Rust-native 3-bit KV-cache engine for edge deployment

✍️ OpenClawRadar📅 Published: April 2, 2026🔗 Source

NexQuant is a Rust-native engine for running high-context models on consumer hardware that would normally struggle with memory constraints. It's positioned as a production-hardened successor to Tom Turney's TurboQuant+ research.

Key technical details

3-5x Memory Reduction: 14B models now fit in 4GB of VRAM or unified memory
MSE-Only Stability: Replaces noisy QJL paths with stable MSE-only trajectory (27/27 logic tests passed)
Integrated Sparse-V: Sparsity is integrated into the real-time decode loop rather than just being a benchmark feature
Zero-Alloc Prefill: Written in 100% Safe Rust for speed without C++ prototype segfault issues
Hardware Support: Native runtime dispatch for Metal, CUDA, and Vulkan, with CPU-AVX2/NEON backend support for older laptops and Raspberry Pi

Implementation specifics

The project uses Walsh-Hadamard Transforms and Rust GGUF parsing. It builds on Tom Turney's PolarQuant/TurboQuant+ breakthroughs that proved 3-bit KV-caches were mathematically possible. The development involved Claude (Anthropic) as a high-speed pair programmer.

The goal is to ensure that as models scale, the ability to run them remains local and decentralized. The team is specifically seeking feedback on Vulkan SPIR-V kernels.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Recursive Self-Improvement Framework for AI Coding Agents Using Claude Code

An open-source framework enables AI coding agents to recursively improve themselves using Claude Code. The system analyzes agent traces, identifies failure patterns, and implements fixes, achieving a 25% performance increase in one test cycle.

Mar 28, 2026, 05:45 PM UTC

OpenClawRadar

Tools

Can OpenClaw Embrace the Power of Claude CLI?

Explore key insights from r/openclaw on whether OpenClaw can integrate with Claude CLI, a powerful AI tool designed to enhance coding and automation processes.

Feb 8, 2026, 04:39 PM UTC

OpenClawRadar

Tools

OpenClaw Model Performance Review: Codex 5.3 Leads, GLM Models Disappoint

A developer tested multiple AI models with OpenClaw, finding Codex 5.3 performs best with 9/10 rating, while GLM 4.7 and GLM 5 scored 5/10 due to high token usage, slow responses, and inconsistent output.

Apr 17, 2026, 02:45 PM UTC

OpenClawRadar

🦀

Tools

Claude to LinkedIn Posts Directly from Chat: Full Workflow

A Reddit user shares a workflow using the Contentdrips MCP connector to write, design, and publish LinkedIn posts directly from Claude AI chat.

Jul 8, 2026, 12:18 PM UTC

OpenClawRadar