Ctxpact: Context Compaction Proxy for Local LLMs

Ctxpact is a lightweight OpenAI-compatible proxy that sits between AI agents and local LLMs to intelligently compress oversized inputs before they hit models with limited context windows. It's designed for agentic workflows like OpenClaw and Hermes that send 100k+ token payloads to models with only 16k context windows, where truncation would lose critical information.
How It Works
The system uses a 3-stage compaction pipeline:
- DCP (Dynamic Context Pruning): Dedups tool calls, strips superseded file writes, truncates error stack traces. Zero LLM calls, purely structural.
- Summarize: Evicts old conversation turns, replaces with LLM-generated summaries. Keeps a sliding window of recent turns intact.
- Extract: When input is still too large (like a 110k novel), uses one of 16 extraction strategies to pull the most relevant content within token budget.
Extraction Strategies
The extraction stage implements 16 strategies ranging from:
- 0 LLM calls: Embedding similarity (ChromaDB), section headers, heuristic keyword grep, LLMLingua compression
- 1 LLM call: LLM generates search terms, IDF-weighted word-level matching assembles context
- 2 LLM calls (best accuracy): readagent — embed + BM25 + RRF fusion, dual LLM term expansion, position-aware excerpting
- N LLM calls: Multi-turn tool-calling loops, DSPy code generation, map-reduce chunking
Benchmark Results
Tested 12 strategies across 2 models (LFM2-8B-A1B and Qwen3.5-9B) on 331 GGUF models total:
- Frankenstein test: 110k tokens compressed to 12k tokens, 8 reading comprehension questions; 8/8 correct, deterministic across 3 consecutive runs, 0% variance
- LoCoMo-MC10: Multi-session conversation QA, 10-choice, random baseline is 10%; readagent + Qwen3.5-9B scores 15/20 (75%)
- Combined performance: readagent + Qwen3.5-9B achieves 87.5%, rlm + Qwen3.5-9B achieves 80.0%
Key Findings
- Model choice matters more than strategy choice: Switching from LFM2 to Qwen3.5 improved every single strategy by +25-50 percentage points. Median strategy went from 5/8 to 7/8 just by changing model.
- NR-MMLU predicts context engineering performance: LFM2's 47% NR-MMLU vs Qwen3.5's 65% maps directly to accuracy differences.
- 2 LLM extraction calls is the sweet spot: Going from 0 to 1 call gives meaningful boost; 1 to 2 calls reaches peak accuracy. Beyond 2 calls, accuracy drops.
- readagent and rlm are breakthrough strategies: Both achieve 8/8 on Frankenstein. Only strategies that solve Q4 (Ireland question). readagent leads cross-domain at 75% LoCoMo vs rlm's 60%.
Technical Details
- Architecture: Standalone proxy (considered LiteLLM plugin and sidecar process) because breakthrough strategies need mid-pipeline LLM calls
- Implementation: ~11k lines of Python, FastAPI server, 3 endpoints, OpenAI-compatible, no heavy frameworks
- Compatibility: Drops in front of any llama-server / Ollama / vLLM backend. No API keys, no cloud, everything runs on your hardware
For developers running local LLMs with agentic workflows that exceed context windows, Ctxpact provides a practical solution to maintain information integrity while staying within hardware constraints.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Orc: Open Source Multi-Project Orchestrator for AI Coding Agents
Orc is an OS-level orchestrator that coordinates AI coding agents across multiple projects using bash, tmux, and git worktrees. It addresses merge conflicts, duplicated work, and coordination overhead with a two-tier review system and zero token burn on orchestration.

LLM Skirmish: A Real-Time Strategy Game Benchmark for AI Coding Agents
LLM Skirmish is a benchmark where AI agents write code to play 1v1 real-time strategy games against each other. It uses a modified Screeps API and tests in-context learning across five tournament rounds.

Using Claude Code to revive abandoned personal projects: a practical walkthrough
Matthew Brunelle shares how he used Claude Code (with Opus 4.6) to resurrect a stalled YouTube Music–to–OpenSubsonic API shim project, complete with setup steps, prompts, and workflow tips.

Agentlint: GitHub App that catches CLAUDE.md contradictions and broken pointers on every PR
Agentlint is a GitHub App that audits your full agent-rules surface (CLAUDE.md, AGENTS.md, skills, hooks) on every PR, posting inline comments for contradictions, broken paths, and unsupported harness features. Free for public repos.