Project Headroom: Netflix Engineer's Open Source Tool Slashes AI Token Costs by 90%

✍️ OpenClawRadar📅 Published: June 2, 2026🔗 Source

Netflix senior engineer Tejas Chopra open-sourced Project Headroom, a local proxy that compresses context window input before it hits the LLM. Early estimates claim up to 90% of tokens are redundant — and since January 2026, the tool has saved users an aggregate $700,000 across 200 billion tokens.

How It Works

Headroom runs as a proxy on port 8787 on the developer's machine. You wrap your LLM CLI with the headroom wrap command, e.g.:

headroom wrap codex

It parses all input — conversation history, logs, tool outputs, files, RAG chunks — and applies lossless, reversible compression. It's best at cutting:

Server logs: 90% jettisoned
MCP tool outputs: 70% redundant JSON
Database outputs: repetitive schemas
File trees: repeated metadata

Building in Python and Node, Headroom current version is v0.22 with 2,000 GitHub stars and 120 forks.

Why It Matters

Chopra was inspired by a $287 Claude Sonnet bill from routine debugging and refactoring. He found the culprit wasn't his instructions — it was boilerplate, JSON schemas, and machine metadata. "This isn’t prose. This isn’t creative writing. This is compressible data masquerading as text," he wrote.

By default, Claude's prefix cache TTL is only five minutes; after inactivity, the entire context refreshes. You can set a longer TTL but pay double for writes to save 90% on reads. Headroom bypasses those tradeoffs.

Alternatives

Other tools exist: RTK (Rust Token Killer) trims verbose command output, and LeanCTX is a variant. Commercial options like Token Company (Y Combinator funded) offer compression-as-a-service. But Headroom's key feature is reversible compression and staying inside the developer's workflow.

📖 Read the full source: HN AI Agents

👀 See Also

Tools

Ouroboros 0.26.0-beta Combines Claude and Codex via MCP Server

Ouroboros 0.26.0-beta introduces a harness that runs Claude and Codex simultaneously, assigning Claude to clarify user intent and Codex to execute well-defined tasks via an MCP server architecture.

Mar 24, 2026, 03:45 AM UTC

OpenClawRadar

Tools

Tastebud Memory: Reversible Agent Memory via Hyperdimensional Computing Vectors

Hyperdimensional computing replaces vector search for complete recall: list ALL days touching a project, detect unnamed workstreams, and decompose daily logs losslessly via dot products.

Jun 14, 2026, 12:15 AM UTC

OpenClawRadar

Tools

A2P: An MCP Server That Enforces Engineering Discipline for AI Coding Agents

A2P (Architect-to-Product) is an AI engineering framework packaged as an MCP server that enforces a gated workflow: Architecture → Plan → Build → Audit → Security → Deploy, with each feature slice requiring RED → GREEN → REFACTOR → SAST → DONE progression.

Apr 17, 2026, 02:48 PM UTC

OpenClawRadar

Tools

Qwen 3.6 27B with MTP on V100 32GB: 54 t/s via llama.cpp Branch

am17an's MTP branch of llama.cc runs Qwen 3.6 27B at 54 t/s on V100 32GB via PCIe adapter, dropping to 29-30 t/s without MTP.

May 6, 2026, 04:17 AM UTC

OpenClawRadar