OpenClaw v2026.3.13 adds per-agent cacheRetention config for OpenAI token cost savings

What changed in v2026.3.13
OpenClaw version 2026.3.13 added proper configuration validation for params.cacheRetention on per-agent entries. This allows you to set cache retention declaratively in your openclaw.json configuration file.
The problem with default cache behavior
OpenAI supports extended prompt cache retention (24 hours) via prompt_cache_retention: "24h" in their API, which keeps your prompt prefix cached for 24 hours instead of the default 5-10 minutes. Cached input tokens are billed at 50% off.
If you're running agents on heartbeat cycles longer than 10 minutes (which the source notes is "basically everyone"), your cache goes completely cold between every single turn. This means you're paying full price for the entire input context on every heartbeat.
The source describes a setup with 15 agents on GPT-5.2 with heartbeats every 60-90 minutes where every heartbeat was a guaranteed cold start. The system prompt, bootstrap context, HEARTBEAT.md, AGENTS.md, SOUL.md, tool definitions — all of it was re-sent at full price every cycle because the cache expired in the gap between heartbeats.
How to configure it
You can now set cache retention in your openclaw.json:
{
"agents": {
"list": [
{
"agentId": "my-agent",
"params": {
"cacheRetention": "long"
}
}
]
}
}The "long" value maps to OpenAI's prompt_cache_retention: "24h" through the pi-ai library.
Important caveat: runtime patch required
OpenClaw's resolveCacheRetention() function has a guard clause that blocks OpenAI providers by default. It only allows Anthropic and Bedrock through. So even with the config set, the value gets filtered out before it reaches the API.
You need the runtime patch from issue #27515 to make it work. The patch adds OpenAI to the allowed provider list in the guard clause. Without both the config AND the patch, nothing happens.
The source author notes they had the patch applied for weeks but never set the config value — meaning the patch was checking extraParams?.cacheRetention !== void 0, getting undefined, and still blocking OpenAI. The patch was doing nothing without the configuration.
Cost savings potential
With 15 agents doing heartbeats, each sending ~128K-170K input tokens per turn:
- Without 24h cache: 100% of input tokens at full price, every turn. Cache dies in the ~60-90 min gap between heartbeats.
- With 24h cache: The stable prefix (system prompt, agent config, tool definitions — typically 80-90% of input) stays cached across heartbeats. Those tokens are billed at half price.
On a system running 15 agents across a full business day, that's hundreds of heartbeat cycles per day where the bulk of input tokens shift from full price to half price. The input cost reduction compounds fast.
📖 Read the full source: r/openclaw
👀 See Also

TLS Interception by Antivirus Breaks Claude Desktop’s Connection; Workaround with AV Exclusions
Antivirus TLS inspection on bridge.claudeusercontent.com causes Cowork (Claude desktop companion) to fail with 'Claude in Chrome is not connected'. Fix: add *.claudeusercontent.com and *.anthropic.com to AV HTTPS exclusions. Node.js --use-system-ca would prevent this.

Practical Habits for Critical LLM Interaction
A Reddit post outlines specific techniques for avoiding confirmation bias when working with LLMs, including custom prompt modes like 'strawberry' for neutral explanation and 'socrates' for adversarial scrutiny, plus evaluating training data composition.

Prompt structure improvements for reliable AI skill execution
A developer shares two key prompt modifications that made their market analysis skill run end-to-end without manual intervention: explicitly separating what the skill should return versus what it should do, and defining explicit failure conditions to prevent improvisation.

Running MiniMax M2.7 Q8_0 128K on 2x3090 with CPU Offloading – Real-World Benchmarks and Config
A user successfully runs MiniMax M2.7 at Q8_0 with 128K context on two RTX 3090s plus DDR4 RAM, achieving ~50 tps prompt processing and ~10 tps token generation, and shares their llama-server flags.