Agent frameworks waste 350,000+ tokens per session resending static files

Token waste benchmark results
Measurements on a local Qwen 3.5 122B setup revealed that agent frameworks waste more than 350,000 tokens per session by repeatedly resending static files. The source describes these numbers as "unreal."
Optimization approach
A compile-time approach was discovered that reduces query context from 1,373 tokens to just 73 tokens. This represents a 95% reduction in token usage for this specific context.
The benchmark also found that naive JSON conversion makes the problem 30% worse, increasing token waste beyond the baseline measurements.
Technical context
Agent frameworks typically include system prompts, tool definitions, and other configuration data that remains static across multiple interactions within a session. When this data is resent with every query, it consumes tokens without providing new information to the model. This is particularly costly with large models like Qwen 3.5 122B where token processing directly impacts both performance and cost.
The compile-time approach likely involves pre-processing static elements so they're referenced rather than resent, similar to how modern web applications cache static assets. For developers working with AI coding agents, reducing this overhead can significantly improve response times and reduce operational costs.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Soul MCP Server Adds Persistent Memory and Safety for Local LLMs
Soul is an open-source MCP server that provides persistent memory across sessions for local LLMs with two commands: n2_boot at start and n2_work_end at end. It includes Ark safety features that block dangerous commands like rm -rf and DROP DATABASE at zero token cost, plus cloud storage configuration.

Local voice-to-text transcription for OpenClaw using Parakeet TDT 0.6b v3
A developer has converted NVIDIA's Parakeet TDT 0.6b v3 model to run locally via ONNX on CPU, supporting 25 European languages. The model provides an OpenAI-compatible API endpoint through a Docker container, allowing integration with OpenClaw for audio file transcription.

CtxSnap VS Code Extension Tracks File Changes for Claude Sessions
CtxSnap is a VS Code extension that tracks which files changed since your last Claude session and packages them into a ready-to-paste handoff block with file contents and a token budget bar calibrated to Claude's 200k context window.

Claude Code hooks prevent Chrome tab interference between multiple sessions
A developer created three hooks (session-start, capture-tab-id, enforce-tab-id) that pin each Claude Code session to its own Chrome tab, preventing sessions from accidentally accessing other sessions' tabs during test runs and form fills.