Agent frameworks waste 350,000+ tokens per session resending static files

✍️ OpenClawRadar📅 Published: April 13, 2026🔗 Source
Agent frameworks waste 350,000+ tokens per session resending static files
Ad

Token waste benchmark results

Measurements on a local Qwen 3.5 122B setup revealed that agent frameworks waste more than 350,000 tokens per session by repeatedly resending static files. The source describes these numbers as "unreal."

Optimization approach

A compile-time approach was discovered that reduces query context from 1,373 tokens to just 73 tokens. This represents a 95% reduction in token usage for this specific context.

The benchmark also found that naive JSON conversion makes the problem 30% worse, increasing token waste beyond the baseline measurements.

Ad

Technical context

Agent frameworks typically include system prompts, tool definitions, and other configuration data that remains static across multiple interactions within a session. When this data is resent with every query, it consumes tokens without providing new information to the model. This is particularly costly with large models like Qwen 3.5 122B where token processing directly impacts both performance and cost.

The compile-time approach likely involves pre-processing static elements so they're referenced rather than resent, similar to how modern web applications cache static assets. For developers working with AI coding agents, reducing this overhead can significantly improve response times and reduce operational costs.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also