Project Headroom: Netflix Engineer's Open Source Tool Slashes AI Token Costs by 90%

Netflix senior engineer Tejas Chopra open-sourced Project Headroom, a local proxy that compresses context window input before it hits the LLM. Early estimates claim up to 90% of tokens are redundant — and since January 2026, the tool has saved users an aggregate $700,000 across 200 billion tokens.
How It Works
Headroom runs as a proxy on port 8787 on the developer's machine. You wrap your LLM CLI with the headroom wrap command, e.g.:
headroom wrap codexIt parses all input — conversation history, logs, tool outputs, files, RAG chunks — and applies lossless, reversible compression. It's best at cutting:
- Server logs: 90% jettisoned
- MCP tool outputs: 70% redundant JSON
- Database outputs: repetitive schemas
- File trees: repeated metadata
Building in Python and Node, Headroom current version is v0.22 with 2,000 GitHub stars and 120 forks.
Why It Matters
Chopra was inspired by a $287 Claude Sonnet bill from routine debugging and refactoring. He found the culprit wasn't his instructions — it was boilerplate, JSON schemas, and machine metadata. "This isn’t prose. This isn’t creative writing. This is compressible data masquerading as text," he wrote.
By default, Claude's prefix cache TTL is only five minutes; after inactivity, the entire context refreshes. You can set a longer TTL but pay double for writes to save 90% on reads. Headroom bypasses those tradeoffs.
Alternatives
Other tools exist: RTK (Rust Token Killer) trims verbose command output, and LeanCTX is a variant. Commercial options like Token Company (Y Combinator funded) offer compression-as-a-service. But Headroom's key feature is reversible compression and staying inside the developer's workflow.
📖 Read the full source: HN AI Agents
👀 See Also

AgentRoom: Desktop app visualizes AI coding agents as pixel characters with session search
AgentRoom is a desktop app that turns Claude Code, Codex, and Gemini sessions into animated pixel characters in a virtual office, with full-text semantic search across all sessions. The repo includes a standalone Claude Code skill for searching past sessions from any conversation.

Arena AI Model ELO History Tracks LLM Performance Decay Over Time
A live dashboard visualizes ELO ratings of flagship models from major AI labs, revealing gradual performance degradation and sudden jumps at new releases. The tool dynamically plots one curve per lab, tracking the highest-rated model.

Comparing Multi-Agent AI Systems: Anthropic's Harness vs Agyn's Engineering Org Model
Anthropic published a harness design for long-running application development, while Agyn's multi-agent system for team-based autonomous software engineering was open-sourced last month. Both systems reject monolithic agents in favor of role separation, structured handoffs, and review loops.

ClawMetry adds remote monitoring with E2E encryption for OpenClaw agents
ClawMetry v0.1.0 now includes cloud sync for remote monitoring of OpenClaw agents from any browser or Mac menu bar app, with end-to-end encryption that keeps data encrypted until it reaches your client.