NexQuant: Rust-native 3-bit KV-cache engine for edge deployment

NexQuant is a Rust-native engine for running high-context models on consumer hardware that would normally struggle with memory constraints. It's positioned as a production-hardened successor to Tom Turney's TurboQuant+ research.
Key technical details
- 3-5x Memory Reduction: 14B models now fit in 4GB of VRAM or unified memory
- MSE-Only Stability: Replaces noisy QJL paths with stable MSE-only trajectory (27/27 logic tests passed)
- Integrated Sparse-V: Sparsity is integrated into the real-time decode loop rather than just being a benchmark feature
- Zero-Alloc Prefill: Written in 100% Safe Rust for speed without C++ prototype segfault issues
- Hardware Support: Native runtime dispatch for Metal, CUDA, and Vulkan, with CPU-AVX2/NEON backend support for older laptops and Raspberry Pi
Implementation specifics
The project uses Walsh-Hadamard Transforms and Rust GGUF parsing. It builds on Tom Turney's PolarQuant/TurboQuant+ breakthroughs that proved 3-bit KV-caches were mathematically possible. The development involved Claude (Anthropic) as a high-speed pair programmer.
The goal is to ensure that as models scale, the ability to run them remains local and decentralized. The team is specifically seeking feedback on Vulkan SPIR-V kernels.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Open Source Curated Collection of OpenClaw Resources Unveiled
Discover a new open-source collection of OpenClaw resources, curated by the community to enhance AI development and collaboration.

Claude Plugins: Computer Vision, Multi-Agent Council, and Self-Debugging Workflow
Three Claude plugins were released: Computer Vision v1.7.0 for Windows app automation, The Council v3.1.0 for adversarial multi-agent consultation, and Upwork Scraper v0.2.0 for job market analysis. A demonstration showed Claude using these plugins to diagnose and fix its own Solitaire automation bug.

Am I OpenAI Compatible: Tool & Docs for Unified API Signatures
A new tool and documentation page documents OpenAI compatibility across open-source AI engines like vLLM and llama.cpp, including official and unofficial signatures.

Orloj: Declarative Orchestration Runtime for Multi-Agent AI Systems
Orloj v0.1.0 is an open-source orchestration runtime that lets you define AI agents, tools, policies, and workflows in YAML manifests with GitOps. It handles scheduling, execution, governance, and reliability for production multi-agent systems.