5 Fixes for OpenClaw + Ollama Model Timeouts

Problem: OpenClaw Agents Silently Failing with Local Ollama Models

A developer debugging OpenClaw 2026.4.2 with Ollama 0.20.2 and the Gemma 4 26B-A4B Q8_0 model on an M4 Max Mac Studio found that agents would not respond after a /new command, despite the model working instantly via ollama run. No errors appeared in logs, and the agent showed no typing indicator.

Root Causes and Fixes

Root Cause #1: Slug Generator Blocking: OpenClaw's session-memory hook runs a slug generator that sends a request to Ollama with a hardcoded 15-second timeout. If the model cannot process OpenClaw's system prompt in time, OpenClaw abandons the request, but Ollama continues processing it, blocking subsequent agent requests.
Fix: openclaw hooks disable session-memory
Root Cause #2: Large System Prompt: OpenClaw injects approximately 38,500 characters of system prompt (identity, tools, bootstrap files) per request. Local models require 40-60 seconds for the prefill phase.
Fix: Add to config to skip bootstrap injection and limit characters:
```
{ "agents": { "defaults": { "skipBootstrap": true, "bootstrapTotalMaxChars": 500 } } }
```
This reduces the prompt to ~19K characters.
Root Cause #3: Hidden Idle Timeout: OpenClaw has a DEFAULT_LLM_IDLE_TIMEOUT_MS of 60 seconds. If the model doesn't produce a first token within this time, it kills the connection and silently falls back to a fallback model (e.g., Claude Sonnet).
Fix: Set an undocumented config key:
```
{ "agents": { "defaults": { "llm": { "idleTimeoutSeconds": 300 } } } }
```
Root Cause #4: Ollama Serial Processing: Ollama processes requests serially, so abandoned slug generator requests can hold processing slots.
Fix: Add to Ollama plist/service config: OLLAMA_NUM_PARALLEL=4
Root Cause #5: Thinking Mode Delay: Gemma 4 defaults to a thinking/reasoning phase that adds 20-30 seconds before the first token.
Fix: Disable in config:
```
{ "agents": { "defaults": { "thinkingDefault": "off" } } }
```

Full Working Configuration

The developer provided this complete config for a working setup:

{ "agents": { "defaults": { "model": { "primary": "ollama/gemma4:26b-a4b-it-q8_0", "fallbacks": ["anthropic/claude-sonnet-4-6"] }, "thinkingDefault": "off", "timeoutSeconds": 600, "skipBootstrap": true, "bootstrapTotalMaxChars": 500, "llm": { "idleTimeoutSeconds": 300 } } } }

Additionally, pin the model in memory to prevent unloading between requests:

curl http://localhost:11434/api/generate -d '{"model":"gemma4:26b-a4b-it-q8_0","keep_alive":-1,"options":{"num_ctx":16384}}'

Results and Trade-offs

After applying the fixes, the first message after /new takes about 60 seconds due to system prompt prefill, which is described as unavoidable for local models. Subsequent messages are fast because Ollama caches the KV state. The setup uses 31GB VRAM, 100% GPU, and a 16K context window, running fully local with zero API cost.

The initial delay is the trade-off for complete local operation, privacy, and no cost. The developer notes this is worth it if those factors are prioritized.

📖 Read the full source: r/LocalLLaMA