BitNet: 1-bit LLM Inference Framework Boosts CPU Speed 6x

What BitNet is

BitNet is Microsoft's official inference framework for 1-bit LLMs (like BitNet b1.58). It provides optimized kernels for fast, lossless inference on CPU and GPU, with NPU support planned. The framework is built on llama.cpp and uses Lookup Table methodologies from T-MAC.

Performance benchmarks

On ARM CPUs: 1.37x to 5.07x speedups with 55.4% to 70.0% energy reduction. On x86 CPUs: 2.37x to 6.17x speedups with 71.9% to 82.2% energy reduction. The latest optimization adds parallel kernel implementations with configurable tiling and embedding quantization support, achieving 1.15x to 2.1x additional speedup over the original implementation.

BitNet can run a 100B BitNet b1.58 model on a single CPU at speeds comparable to human reading (5-7 tokens per second).

Supported models

BitNet-b1.58-2B-4T (2.4B parameters) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
bitnet_b1_58-large (0.7B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
bitnet_b1_58-3B (3.3B) - x86: ❌ I2_S, ❌ TL1, ✅ TL2 | ARM: ❌ I2_S, ✅ TL1, ❌ TL2
Llama3-8B-1.58-100B-tokens (8.0B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
Falcon3 Family (1B-10B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
Falcon-E Family (1B-3B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2

Installation requirements

Python≥3.9, CMake≥3.22, Clang≥18. For Windows: Visual Studio 2022 with Desktop development with C++, C++-CMake Tools for Windows, Git for Windows, C++-Clang Compiler for Windows, and MS-Build Support for LLVM-Toolset (clang). For Debian/Ubuntu: Use the automatic installation script: bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"

Build from source

Clone the repository: git clone --recursive https://github.com/microsoft/BitNet.git

Change directory: cd BitNet

Install dependencies: # (Recommended) Create a new conda

Windows users must use a Developer Command Prompt/PowerShell for VS2022 for build commands.

Recent updates

01/15/2026: BitNet CPU Inference Optimization
05/20/2025: BitNet Official GPU inference kernel
04/14/2025: BitNet Official 2B Parameter Model on Hugging Face
02/18/2025: Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
11/08/2024: BitNet a4.8: 4-bit Activations for 1-bit LLMs
10/21/2024: 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
10/17/2024: bitnet.cpp 1.0 released

📖 Read the full source: HN AI Agents