Guide: Deploying OpenClaw with llama.cpp on GEEKOM IT15 Mini PC

Deployment Architecture and Key Changes
This guide outlines a deployment where OpenClaw's gateway (port 18789) connects to a manually managed llama-server (port 8080) instead of the default Ollama server (port 11434). The goal is to run a local Qwen3-8B model using Intel Arc GPU acceleration via SYCL.
Debugging and Solutions
The process involved resolving several configuration conflicts:
- Issue 1: Unsupported mcpServers Config: OpenClaw does not support the
mcpServersconfiguration key. The solution was to remove this section fromopenclaw.jsonand use batch files to manually startllama-server, integrating its startup logic into Python code. - Issue 2: Session Cache Conflict: A cached Feishu channel session was overriding the new global configuration, causing Ollama API errors. This was fixed by deleting the session cache file:
del "C:\Users\JiugeAItest\.openclaw\agents\main\sessions\sessions.json". - Issue 3: Insufficient Context Length: The default
llama-servercontext of 4096 tokens caused errors for longer conversations. This was resolved by starting the server with-c 32768and settingcontextWindow: 32768in the OpenClaw configuration.
Deployment Steps
The setup uses a specific directory structure on the GEEKOM IT15:
E:\Workspace_AI\Buildup_OpenClow ├── llama-b8245-bin-win-sycl-x64\ # llama.cpp SYCL version │ ├── llama-server.exe │ └── ... (DLLs) ├── models\Qwen3-8B-GGUF\ │ └── Qwen3-8B-Q4_K_M.gguf # Model file └── start_openclaw_with_llamacpp.bat # Startup script
Note: The Qwen3-8B-Q4_K_M.gguf model is verified compatible with llama.cpp version b8245. Qwen3.5 models are incompatible with this version due to a rope.dimension_sections length mismatch.
OpenClaw Configuration
The primary configuration change is in C:\Users\<Username>\.openclaw\openclaw.json. The model provider is switched from ollama to llama-cpp:
{
"agents": {
"defaults": {
"model": {
"primary": "llama-cpp/qwen3-8b"
}
}
},
"models": {
"providers": {
"ollama": { ... },
"llama-cpp": {
"api": "openai-completions",
"apiKey": "llama-cpp-local",
"baseUrl": "http://127.0.0.1:8080/v1",
"models": [
{
"contextWindow": 32768,
"id": "qwen3-8b",
"name": "qwen3-8b",
...
}
]
}
}
}
}The guide also includes sections on parameter reference, a pitfall avoidance guide, troubleshooting, and instructions for switching back to Ollama if needed.
📖 Read the full source: r/openclaw
👀 See Also

SOUL.md rules drift in long AI agent sessions and how to fix it
SOUL.md rules work perfectly for the first 10-15 messages but start drifting around message 20-30 as conversation context overrides the initial system prompt. The solution is to use /new more aggressively to reset sessions before each distinct task.

Practical Prompt Structure for Claude AI Execution Agents
A developer shares prompt engineering techniques that reduced hallucinations in Claude AI agents performing API calls, data extraction, and multi-step workflows. Key strategies include writing prompts as contracts, dedicating 40% of tokens to error handling, and separating 'wait' from 'stop' conditions.

Debugging OpenClaw + Ollama Local Model Timeouts: Five Fixes for Silent Failures
A developer identified five root causes for OpenClaw agents silently timing out with local Ollama models like Gemma 4 26B, including a blocking slug generator, a 38K character system prompt, and hidden timeouts. The fixes involve disabling hooks, modifying configs, and adjusting Ollama settings.

Claude Code v2.1.36: Fast Mode Now Available for Opus 4.6
Anthropic releases Claude Code v2.1.36 with Fast Mode support for the latest Opus 4.6 model, enabling significantly faster code generation and analysis.