Codebook Lossless LLM Compression: 10-25% RAM Reduction with Bitwise Packing

✍️ OpenClawRadar📅 Published: March 15, 2026🔗 Source
Codebook Lossless LLM Compression: 10-25% RAM Reduction with Bitwise Packing
Ad

A developer has published proof-of-concept code for lossless LLM compression that reduces memory usage by 10-25% through bitwise generic packing of indexed weights. The technique trades some inference speed for smaller model size, making it possible to run larger models on hardware with limited VRAM.

How It Works

The developer started by asking how many unique values actually exist in LLM layers. Analysis revealed that while fp16 uses 16 bits, most models only utilize about 12-13 bits of unique values. By packing these values into blocks, the technique achieves compression without losing precision.

Performance Characteristics

  • RAM reduction: 10-25%+ across tested models
  • Speed impact: Inference speed approximately halved in example tests
  • Test hardware: NVIDIA P2200 (5GB) and CPU, with updates being developed for AMD MI50 (32GB)
Ad

Implementation Details

The developer worked on this project for several weeks using AI coding assistants including Claude, Qwen, and Gemini. The repository includes both lossless and lossy/balanced versions, though the lossy version hasn't been extensively tested yet.

The developer suggests this compression approach might serve as a way to measure a model's "compactness" - how efficiently it uses its parameter space.

Code Availability

The proof-of-concept code is available on GitHub: https://github.com/bigattichouse/Codebook-Quantization

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

Agent Skill Harbor: GitHub-native skill management for AI agent teams
Tools

Agent Skill Harbor: GitHub-native skill management for AI agent teams

Agent Skill Harbor is an open-source platform for teams to share, track, and govern AI agent skills using GitHub-native workflows. It collects skills from GitHub repos, tracks provenance, supports safety checks, and publishes a static catalog site with GitHub Actions and Pages.

OpenClawRadar
LORE.md: An Open Standard for Extracting Structured Knowledge from AI Conversations
Tools

LORE.md: An Open Standard for Extracting Structured Knowledge from AI Conversations

LORE.md is an open standard for extracting durable knowledge from AI conversations into a structured format. It captures decisions with rationale, insights, patterns, open questions, and next steps, with everything linking across sessions.

OpenClawRadar
ClawWatcher Reaches 200 Users, Reports $28K+ in Collective OpenClaw API Savings
Tools

ClawWatcher Reaches 200 Users, Reports $28K+ in Collective OpenClaw API Savings

ClawWatcher, a tool that tracks OpenClaw API costs in real-time, has reached 200 users. According to its creator, users have collectively saved over $28,000 in API costs, with an average cost reduction of 45%.

OpenClawRadar
FixAI Dev: A Consumer Rights Game Using Claude Haiku with Strict JSON Contracts
Tools

FixAI Dev: A Consumer Rights Game Using Claude Haiku with Strict JSON Contracts

A developer built a browser game where Claude Haiku acts as a corporate AI denying consumer requests; players argue using real consumer protection laws across 37 cases in EU, US, UK, and Australia. The architecture uses Haiku for language only, with server-side game logic and strict JSON contracts between components.

OpenClawRadar