agentcache: Python Library for Multi-Agent LLM Prefix Caching

agentcache is a Python library designed to optimize multi-agent LLM systems by implementing prefix caching as a core feature. The library addresses the common problem where frameworks like CrewAI, AutoGen, and open-multi-agent create fresh sessions for each worker, resulting in zero cache hits and duplicated prompt costs.

How It Works

The library operates on a fork-based approach instead of creating separate sessions:

Start one session with a shared system prompt
Make the first call - provider computes and caches the prefix
When you need N workers, fork instead of creating N new sessions
Parent session: [system, msg1, msg2, ...]
Forked session: [system, msg1, msg2, ..., WORKER_TASK]
Exact same prefix = cache hit

Key Features

Cache-safe forks: Maintains identical prefixes across worker sessions
Cache-break detection: Diffs snapshots and reports exactly what changed when cache hits drop
Cache-safe compaction: For long-running sessions, scans old tool outputs before each call and replaces large results with deterministic placeholders to maintain smaller context while preserving cacheable prefixes
Parameter freezing: Freezes cache-relevant parameters before forking (system prompt, model, tools, messages, reasoning config)
Task DAG scheduling: Enables parallel workers from one cached session

Performance Results

In a head-to-head test with GPT-4o-mini (coordinator + 3 workers, same task):

Text injection / separate sessions: 0% cache hits, 85.7 seconds
Prefix forks: 75.8% cache hits, 37.4 seconds
Per worker cache hit rates typically range from 80-99%

Installation and Usage

Install via pip:

pip install "git+https://github.com/masteragentcoder/agentcache.git@main"

The library is available on GitHub at github.com/masteragentcoder/agentcache.

📖 Read the full source: r/LocalLLaMA

agentcache: Python Library for Multi-Agent LLM Prefix Caching

How It Works

Key Features

Performance Results

Installation and Usage

👀 See Also

Multi-Agent System for Deep Competitive Analysis with Claude

Tocket CLI: A Context Engineering Framework for AI Coding Agents

ClaudeDesk v4.2–4.3 Introduces Agent Teams Visualization and Repository Atlas Engine

Bullshit Benchmark Tests LLM Resistance to Nonsensical Prompts