agentcache: Python Library for Multi-Agent LLM Prefix Caching

agentcache is a Python library designed to optimize multi-agent LLM systems by implementing prefix caching as a core feature. The library addresses the common problem where frameworks like CrewAI, AutoGen, and open-multi-agent create fresh sessions for each worker, resulting in zero cache hits and duplicated prompt costs.
How It Works
The library operates on a fork-based approach instead of creating separate sessions:
- Start one session with a shared system prompt
- Make the first call - provider computes and caches the prefix
- When you need N workers, fork instead of creating N new sessions
- Parent session: [system, msg1, msg2, ...]
- Forked session: [system, msg1, msg2, ..., WORKER_TASK]
- Exact same prefix = cache hit
Key Features
- Cache-safe forks: Maintains identical prefixes across worker sessions
- Cache-break detection: Diffs snapshots and reports exactly what changed when cache hits drop
- Cache-safe compaction: For long-running sessions, scans old tool outputs before each call and replaces large results with deterministic placeholders to maintain smaller context while preserving cacheable prefixes
- Parameter freezing: Freezes cache-relevant parameters before forking (system prompt, model, tools, messages, reasoning config)
- Task DAG scheduling: Enables parallel workers from one cached session
Performance Results
In a head-to-head test with GPT-4o-mini (coordinator + 3 workers, same task):
- Text injection / separate sessions: 0% cache hits, 85.7 seconds
- Prefix forks: 75.8% cache hits, 37.4 seconds
- Per worker cache hit rates typically range from 80-99%
Installation and Usage
Install via pip:
pip install "git+https://github.com/masteragentcoder/agentcache.git@main"
The library is available on GitHub at github.com/masteragentcoder/agentcache.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Developer shares hybrid AI coding workflow: Claude for planning, local models for execution
A developer built a pipeline using Claude 3.5 Sonnet for task planning and local Qwen2.5-Coder models via Ollama for code generation, achieving 85% token reduction compared to using Claude alone.

Trepan: Local VS Code Security Auditor for AI-Generated Code
Trepan is an open-source VS Code extension that acts as a security gatekeeper for AI-generated code suggestions. It uses Ollama to run local security audits against project-specific rules in a .trepan/system_rules.md file.

Stanford Researchers Release OpenJarvis: A Local-First Framework for On-Device AI Agents
Stanford researchers have released OpenJarvis, a local-first framework for building on-device personal AI agents with tools, memory, and learning capabilities. The project includes GitHub repository and website links for developers to explore.

Headless OpenClaw Setup with Discord via Docker Scripts
A GitHub repository provides scripts to run OpenClaw with Discord in a headless Docker container, avoiding the TUI/WebUI. It includes a management script with commands like claw init, start, and stop, plus preconfigured support for OpenAI Responses API, Chromium, and various tools.