agentcache: Python Library for Multi-Agent LLM Prefix Caching

✍️ OpenClawRadar📅 Published: April 13, 2026🔗 Source
agentcache: Python Library for Multi-Agent LLM Prefix Caching
Ad

agentcache is a Python library designed to optimize multi-agent LLM systems by implementing prefix caching as a core feature. The library addresses the common problem where frameworks like CrewAI, AutoGen, and open-multi-agent create fresh sessions for each worker, resulting in zero cache hits and duplicated prompt costs.

How It Works

The library operates on a fork-based approach instead of creating separate sessions:

  • Start one session with a shared system prompt
  • Make the first call - provider computes and caches the prefix
  • When you need N workers, fork instead of creating N new sessions
  • Parent session: [system, msg1, msg2, ...]
  • Forked session: [system, msg1, msg2, ..., WORKER_TASK]
  • Exact same prefix = cache hit
Ad

Key Features

  • Cache-safe forks: Maintains identical prefixes across worker sessions
  • Cache-break detection: Diffs snapshots and reports exactly what changed when cache hits drop
  • Cache-safe compaction: For long-running sessions, scans old tool outputs before each call and replaces large results with deterministic placeholders to maintain smaller context while preserving cacheable prefixes
  • Parameter freezing: Freezes cache-relevant parameters before forking (system prompt, model, tools, messages, reasoning config)
  • Task DAG scheduling: Enables parallel workers from one cached session

Performance Results

In a head-to-head test with GPT-4o-mini (coordinator + 3 workers, same task):

  • Text injection / separate sessions: 0% cache hits, 85.7 seconds
  • Prefix forks: 75.8% cache hits, 37.4 seconds
  • Per worker cache hit rates typically range from 80-99%

Installation and Usage

Install via pip:

pip install "git+https://github.com/masteragentcoder/agentcache.git@main"

The library is available on GitHub at github.com/masteragentcoder/agentcache.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

Roost: A Single-Go-Binary Sidebar for Claude Code with Clickable Prompt History, File Tree, and Notifications
Tools

Roost: A Single-Go-Binary Sidebar for Claude Code with Clickable Prompt History, File Tree, and Notifications

Roost is a single Go binary that adds a web-based sidebar to Claude Code: xterm.js terminal backed by tmux, file tree that follows your cd, clickable prompt history from ~/.claude/projects/*.jsonl, and push notifications via Claude Code's Stop hook. Run over SSH as single-user-per-instance; no build step on the frontend.

OpenClawRadar
Team Memory MCP: Open Source Shared Memory for Claude Code with Bayesian Confidence Scoring
Tools

Team Memory MCP: Open Source Shared Memory for Claude Code with Bayesian Confidence Scoring

Team Memory MCP is an open source tool that provides shared team memory for Claude Code with Bayesian confidence scoring. It uses a Beta-Bernoulli model to rank patterns, includes temporal decay with 90-day half-life, and can be added to Claude Code with a single command.

OpenClawRadar
SkillMesh: MCP-Friendly Router for Large Tool Catalogs Reduces Context Size by 70%
Tools

SkillMesh: MCP-Friendly Router for Large Tool Catalogs Reduces Context Size by 70%

SkillMesh is an MCP-friendly router that retrieves only relevant expert cards for AI agent queries, reducing context size by 70% and improving tool selection. It supports Claude via MCP server, Codex skill bundles, and OpenAI-style function schemas.

OpenClawRadar
🦀
Tools

Spine Swarm: Multi-Agent AI System on Visual Canvas for Non-Coding Projects

Spine Swarm is a multi-agent system that works on an infinite visual canvas to complete complex non-coding projects like competitive analysis, financial modeling, SEO audits, pitch decks, and interactive prototypes. The system uses blocks as abstractions on top of AI models that can be connected to pass context between different model types.

OpenClawRadar