Codebook Lossless LLM Compression: 10-25% RAM Reduction with Bitwise Packing

✍️ OpenClawRadar📅 Published: March 15, 2026🔗 Source

A developer has published proof-of-concept code for lossless LLM compression that reduces memory usage by 10-25% through bitwise generic packing of indexed weights. The technique trades some inference speed for smaller model size, making it possible to run larger models on hardware with limited VRAM.

How It Works

The developer started by asking how many unique values actually exist in LLM layers. Analysis revealed that while fp16 uses 16 bits, most models only utilize about 12-13 bits of unique values. By packing these values into blocks, the technique achieves compression without losing precision.

Performance Characteristics

RAM reduction: 10-25%+ across tested models
Speed impact: Inference speed approximately halved in example tests
Test hardware: NVIDIA P2200 (5GB) and CPU, with updates being developed for AMD MI50 (32GB)

Implementation Details

The developer worked on this project for several weeks using AI coding assistants including Claude, Qwen, and Gemini. The repository includes both lossless and lossy/balanced versions, though the lossy version hasn't been extensively tested yet.

The developer suggests this compression approach might serve as a way to measure a model's "compactness" - how efficiently it uses its parameter space.

Code Availability

The proof-of-concept code is available on GitHub: https://github.com/bigattichouse/Codebook-Quantization

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Code Evolution Method Triples LLM Performance on ARC-AGI-2 Benchmark

Researchers achieved a 2.8x improvement on the ARC-AGI-2 benchmark using code evolution with open-weight models, reaching 34% accuracy at $2.67 per task. The same method pushed Gemini 3.1 Pro to 95% accuracy at $8.71 per task.

Feb 28, 2026, 01:45 AM UTC

OpenClawRadar

Tools

Werld: Open-Ended Artificial Life Simulation with Evolving Neural Networks

Werld is a real-time artificial life simulation where agents with NEAT neural networks evolve their own neural architecture, sensory processing, and behaviors without hardcoded rules or reward functions. The simulation starts with 30 agents on a Watts-Strogatz small-world graph with 64 sensory channels, 7 continuous motor functions, and 29 heritable genome traits.

Mar 1, 2026, 01:45 AM UTC

OpenClawRadar

Tools

MemAware Benchmark Tests AI Memory Beyond Keyword Search

MemAware is a benchmark with 900 questions across 3 difficulty levels that tests whether AI assistants with memory can surface relevant context when queries don't hint at it. Results show BM25 search scored 2.8% vs 0.8% with no memory, while vector search drops to 0.7% on cross-domain connections.

Mar 27, 2026, 03:45 PM UTC

OpenClawRadar

Tools

NotebookLM MCP Structured: Free Server Connects Claude to NotebookLM with Automatic Prompt Structuring

A free MCP server called NotebookLM MCP Structured connects Claude Desktop to NotebookLM notebooks with automatic prompt structuring. The server restructures queries based on type (comparison, list, analysis, explanation, or extraction) and adds completeness checks and fidelity constraints.

Mar 21, 2026, 07:45 AM UTC

OpenClawRadar