Project Headroom: Netflix Engineer's Open Source Tool Slashes AI Token Costs by 90%

✍️ OpenClawRadar📅 Published: June 2, 2026🔗 Source
Project Headroom: Netflix Engineer's Open Source Tool Slashes AI Token Costs by 90%
Ad

Netflix senior engineer Tejas Chopra open-sourced Project Headroom, a local proxy that compresses context window input before it hits the LLM. Early estimates claim up to 90% of tokens are redundant — and since January 2026, the tool has saved users an aggregate $700,000 across 200 billion tokens.

How It Works

Headroom runs as a proxy on port 8787 on the developer's machine. You wrap your LLM CLI with the headroom wrap command, e.g.:

headroom wrap codex

It parses all input — conversation history, logs, tool outputs, files, RAG chunks — and applies lossless, reversible compression. It's best at cutting:

  • Server logs: 90% jettisoned
  • MCP tool outputs: 70% redundant JSON
  • Database outputs: repetitive schemas
  • File trees: repeated metadata

Building in Python and Node, Headroom current version is v0.22 with 2,000 GitHub stars and 120 forks.

Ad

Why It Matters

Chopra was inspired by a $287 Claude Sonnet bill from routine debugging and refactoring. He found the culprit wasn't his instructions — it was boilerplate, JSON schemas, and machine metadata. "This isn’t prose. This isn’t creative writing. This is compressible data masquerading as text," he wrote.

By default, Claude's prefix cache TTL is only five minutes; after inactivity, the entire context refreshes. You can set a longer TTL but pay double for writes to save 90% on reads. Headroom bypasses those tradeoffs.

Alternatives

Other tools exist: RTK (Rust Token Killer) trims verbose command output, and LeanCTX is a variant. Commercial options like Token Company (Y Combinator funded) offer compression-as-a-service. But Headroom's key feature is reversible compression and staying inside the developer's workflow.

📖 Read the full source: HN AI Agents

Ad

👀 See Also