Prefex: A Local Proxy for Claude Code That Automates Prompt Caching and Session Memory

Prefex is a local proxy tool designed to reduce API costs when using Claude Code. It addresses two specific cost inefficiencies: Anthropic's beta prompt caching feature requires manual header injection, and Claude Code sends full conversation history with every request.
How It Works
Prefex runs entirely on your local machine as a proxy between Claude Code and Anthropic's API. It automatically injects the specific header needed to activate Anthropic's prompt caching feature, which reduces costs for repeated input tokens by 90%. Without this header, all requests including your CLAUDE.md and project context are billed at full price.
The tool also implements session memory, preventing Claude Code from resending the entire conversation history with each turn. Additionally, it includes a model router that can route simpler queries to cheaper models, though this feature wasn't active during the initial testing period.
Performance and Installation
In a 4-day test with normal usage:
- 1,338 requests processed
- $49.60 actual cost with Prefex
- $348 estimated cost without Prefex
- 86% savings achieved (with caching only, no model routing)
The developer provides a benchmark that runs 5 questions on karpathy/nanoGPT with cold and warm starts, costing approximately $0.03. Cost calculations use Anthropic's actual billing fields.
Installation requires one curl command and adding one line to settings.json. The package includes an uninstall script. The tool operates locally with no external servers, no telemetry, and API keys go directly to Anthropic.
📖 Read the full source: r/ClaudeAI
👀 See Also

Lightpanda: Open-source headless browser for LLM agents with native MCP server and markdown output
Lightpanda is an open-source headless browser designed for LLM-powered agents that uses 16x less memory than Chrome (215MB vs 2GB) and completes web crawling benchmarks in 5 seconds instead of 47 seconds. It provides native markdown output, semantic tree with interactivity detection, and a built-in MCP server.

ClawDeckX: Open-Sourced macOS-Style Web Platform for OpenClaw Agent Management
ClawDeckX is an open-source web platform for installing, configuring, and monitoring OpenClaw agents. It provides visual management tools, real-time monitoring, and supports 13 languages.

fintool adds stock and prediction market trading to OpenClaw agents
fintool is a new OpenClaw skill that enables AI agents to trade stocks and prediction markets. Installation requires reading a GitHub file, after which agents can execute trades on Hyperliquid, Binance, and Polymarket with JSON output for clean integration.

Engram: Open-source memory layer for Claude Code and MCP clients
Engram is an open-source memory layer that works as an MCP server with any client like Claude Code, Cursor, or Windsurf. It stores unlimited memories with semantic vector search, achieves 80% accuracy on LOCOMO benchmark, and uses about 800 tokens per query versus 5K+ for file-based approaches.