Microsoft BitNet: 1-bit LLM inference framework for CPU and GPU

What BitNet is
BitNet is Microsoft's official inference framework for 1-bit LLMs (like BitNet b1.58). It provides optimized kernels for fast, lossless inference on CPU and GPU, with NPU support planned. The framework is built on llama.cpp and uses Lookup Table methodologies from T-MAC.
Performance benchmarks
On ARM CPUs: 1.37x to 5.07x speedups with 55.4% to 70.0% energy reduction. On x86 CPUs: 2.37x to 6.17x speedups with 71.9% to 82.2% energy reduction. The latest optimization adds parallel kernel implementations with configurable tiling and embedding quantization support, achieving 1.15x to 2.1x additional speedup over the original implementation.
BitNet can run a 100B BitNet b1.58 model on a single CPU at speeds comparable to human reading (5-7 tokens per second).
Supported models
- BitNet-b1.58-2B-4T (2.4B parameters) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
- bitnet_b1_58-large (0.7B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
- bitnet_b1_58-3B (3.3B) - x86: ❌ I2_S, ❌ TL1, ✅ TL2 | ARM: ❌ I2_S, ✅ TL1, ❌ TL2
- Llama3-8B-1.58-100B-tokens (8.0B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
- Falcon3 Family (1B-10B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
- Falcon-E Family (1B-3B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
Installation requirements
Python≥3.9, CMake≥3.22, Clang≥18. For Windows: Visual Studio 2022 with Desktop development with C++, C++-CMake Tools for Windows, Git for Windows, C++-Clang Compiler for Windows, and MS-Build Support for LLVM-Toolset (clang). For Debian/Ubuntu: Use the automatic installation script: bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
Build from source
Clone the repository: git clone --recursive https://github.com/microsoft/BitNet.git
Change directory: cd BitNet
Install dependencies: # (Recommended) Create a new conda
Windows users must use a Developer Command Prompt/PowerShell for VS2022 for build commands.
Recent updates
- 01/15/2026: BitNet CPU Inference Optimization
- 05/20/2025: BitNet Official GPU inference kernel
- 04/14/2025: BitNet Official 2B Parameter Model on Hugging Face
- 02/18/2025: Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
- 11/08/2024: BitNet a4.8: 4-bit Activations for 1-bit LLMs
- 10/21/2024: 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
- 10/17/2024: bitnet.cpp 1.0 released
📖 Read the full source: HN AI Agents
👀 See Also

Analyzing AI Coding Tools: Dissecting 3,177 API Calls
A technical breakdown of 3,177 API calls unveils how four AI coding tools manage context windows, revealing inefficiencies and variances.

Exasol Releases MCP Server for Database Context in AI Agent Workflows
Exasol has released an MCP Server that enables databases to provide context to AI agents about available data, business rules, and safe interaction methods. The server is read-only by default, supports high-concurrency workflows, and can be deployed on-prem, in cloud, or hybrid environments.

Replacing complex retrieval pipelines with simple git shell commands for LLM agents
A developer replaced their entire AI agent retrieval pipeline (sentence-transformers, rank-bm25, two-pass LLM pipeline) with a single tool that lets the agent execute read-only shell commands against a git repository, reducing Docker image size by ~3GB and eliminating timeout issues.

Auto-Fix System Uses Claude Code Headless to Detect and Fix Production Errors
A developer built an automated production error-fixing system using Claude Code CLI in headless mode. The system detects errors from logs, creates isolated git worktrees for each issue, prompts Claude to write fixes, and requires manual approval via Telegram before creating PRs.