Codebook Lossless LLM Compression: 10-25% RAM Reduction with Bitwise Packing

A developer has published proof-of-concept code for lossless LLM compression that reduces memory usage by 10-25% through bitwise generic packing of indexed weights. The technique trades some inference speed for smaller model size, making it possible to run larger models on hardware with limited VRAM.
How It Works
The developer started by asking how many unique values actually exist in LLM layers. Analysis revealed that while fp16 uses 16 bits, most models only utilize about 12-13 bits of unique values. By packing these values into blocks, the technique achieves compression without losing precision.
Performance Characteristics
- RAM reduction: 10-25%+ across tested models
- Speed impact: Inference speed approximately halved in example tests
- Test hardware: NVIDIA P2200 (5GB) and CPU, with updates being developed for AMD MI50 (32GB)
Implementation Details
The developer worked on this project for several weeks using AI coding assistants including Claude, Qwen, and Gemini. The repository includes both lossless and lossy/balanced versions, though the lossy version hasn't been extensively tested yet.
The developer suggests this compression approach might serve as a way to measure a model's "compactness" - how efficiently it uses its parameter space.
Code Availability
The proof-of-concept code is available on GitHub: https://github.com/bigattichouse/Codebook-Quantization
📖 Read the full source: r/LocalLLaMA
👀 See Also

Agent Skill Harbor: GitHub-native skill management for AI agent teams
Agent Skill Harbor is an open-source platform for teams to share, track, and govern AI agent skills using GitHub-native workflows. It collects skills from GitHub repos, tracks provenance, supports safety checks, and publishes a static catalog site with GitHub Actions and Pages.

LORE.md: An Open Standard for Extracting Structured Knowledge from AI Conversations
LORE.md is an open standard for extracting durable knowledge from AI conversations into a structured format. It captures decisions with rationale, insights, patterns, open questions, and next steps, with everything linking across sessions.

ClawWatcher Reaches 200 Users, Reports $28K+ in Collective OpenClaw API Savings
ClawWatcher, a tool that tracks OpenClaw API costs in real-time, has reached 200 users. According to its creator, users have collectively saved over $28,000 in API costs, with an average cost reduction of 45%.

FixAI Dev: A Consumer Rights Game Using Claude Haiku with Strict JSON Contracts
A developer built a browser game where Claude Haiku acts as a corporate AI denying consumer requests; players argue using real consumer protection laws across 37 cases in EU, US, UK, and Australia. The architecture uses Haiku for language only, with server-side game logic and strict JSON contracts between components.