OpenClaw 2026.3.11 release adds local-first Ollama setup, multimodal memory, and Discord thread controls

Local-first Ollama becomes first-class experience
The update adds first-class Ollama setup with Local or Cloud + Local modes, browser-based cloud sign-in, curated model suggestions, and cloud-model handling that skips unnecessary local pulls. You can now bootstrap a local-only or hybrid Ollama agent from the onboarding flow instead of hand-editing configs. The wizard suggests good-default models for coding, planning, etc., and skips unnecessary local pulls when using cloud-only models.
OpenCode Zen + Go now share one key, different roles
OpenClaw now treats Zen and Go as one OpenCode setup in the wizard/docs, stores one shared OpenCode key, keeps runtime providers split, and stops overriding built-in opencode-go routing. This means you can use one OpenCode key for both Zen and Go, then route tasks by purpose instead of splitting keys. Zen can stay your "fast coder" model while Go handles heavier planning or long-context runs.
Images + audio become searchable "working memory"
The release adds opt-in multimodal image and audio indexing for memorySearch.extraPaths with Gemini gemini-embedding-2-preview, strict fallback gating, and scope-based reindexing. Memory/Gemini gets gemini-embedding-2-preview memory-search support with configurable output dimensions and automatic reindexing when dimensions change. You can now index images and audio into OpenClaw's memory and let agents search them alongside text notes.
macOS UI improvements
The macOS chat UI now includes a chat model picker, persists explicit thinking-level selections across relaunch, and hardens provider-aware session model sync for the shared chat composer. You can pick your model directly in the macOS chat UI instead of guessing which config is active, and your chosen thinking-level (e.g., verbose/compact reasoning) persists across restarts.
Discord thread archiving controls
Discord/auto threads now support autoArchiveDuration channel config for auto-created threads so Discord thread archiving can stay at 1 hour, 1 day, 3 days, or 1 week instead of always using the 1-hour default. You can set different archiving times for different channels or bots.
📖 Read the full source: r/LocalLLaMA
👀 See Also

DeepSeek-V4 Pro and Flash: 1.6T Parameters, 1M Token Context, Hybrid Attention
DeepSeek-V4-Pro (1.6T params, 49B active) and V4-Flash (284B params, 13B active) support 1M token context. New hybrid attention (CSA + HCA) reduces single-token inference FLOPs to 27% and KV cache to 10% of DeepSeek-V3.2.

Microsoft's BitNet Enables 100B Parameter LLM Inference on Single CPU
Microsoft's open-source BitNet project achieves 100B parameter LLM inference at 5-7 tokens/second on a single CPU, with the 2B parameter model using 0.4GB memory and 29ms latency while matching full-precision models on benchmarks.

Claude Code on the Web Partial Outage Reported
An automatic status update from r/ClaudeAI reports a partial outage for Claude Code on the web starting 2026-05-09T23:33:21.000Z. Check the official status page and community megathread for updates.

Claude Code v2.1.187: Structured Output Fixes, Sandbox Security, and Org Model Restrictions
Claude Code v2.1.187 adds sandbox.credentials setting, org model restrictions, and fixes for structured output loops, remote MCP hangs, and subagent depth tracking.