Gemma 4 26B-A4B Local Setup: LM Studio 0.4.0 Headless CLI

What LM Studio 0.4.0 Adds for Local AI

LM Studio 0.4.0 fundamentally changes the architecture by extracting the core inference engine into llmster, a standalone server. This enables running LM Studio entirely from the command line using the new lms CLI, eliminating the need for the GUI. The update makes it usable on headless servers, in CI/CD pipelines, SSH sessions, or for terminal-focused developers.

Key Features in 0.4.0

llmster daemon: A background service that manages model loading and inference without the desktop app
lms CLI: Full command-line interface for downloading, loading, chatting, and serving models
Parallel request processing: Continuous batching instead of sequential queuing, allowing multiple requests to the same model to run concurrently
Stateful REST API: A new /v1/chat endpoint that maintains conversation history across requests
MCP integration: Local Model Context Protocol support with permission-key gating

Why Gemma 4 26B-A4B for Local Use

Google's Gemma 4 26B-A4B uses a mixture-of-experts architecture with 128 experts plus 1 shared expert, but only activates 8 experts (3.8B parameters) per token. This means it runs well on hardware that couldn't handle a dense 26B model. On a 14" MacBook Pro M4 Pro with 48GB unified memory, it fits comfortably and generates at 51 tokens/second.

The model scores 82.6% on MMLU Pro and 88.3% on AIME 2026, close to the dense 31B variant (85.2% and 89.2%) while running dramatically faster. It achieves an Elo score of ~1441, competing with models like Qwen 3.5 397B-A17B (~1450 Elo) that require 100-600B total parameters.

Key capabilities include 256K max context, vision support for analyzing screenshots and diagrams, native function/tool calling, and reasoning with configurable thinking modes.

Practical Setup

The article walks through installing the lms CLI and setting up Gemma 4 26B-A4B for local inference that can be used with Claude Code. The author notes significant slowdowns when used within Claude Code from their experience.

📖 Read the full source: HN AI Agents

Running Google Gemma 4 26B-A4B Locally with LM Studio 0.4.0 Headless CLI

What LM Studio 0.4.0 Adds for Local AI

Key Features in 0.4.0

Why Gemma 4 26B-A4B for Local Use

Practical Setup

👀 See Also

Two Months with GitHub's Spec-Kit and Claude Code: What Works, What Doesn't

Conduid: Trust Infrastructure Layer for MCP Servers Built with Claude

OpenPlawd: OpenClaw Skill for Automated Plaud Meeting Notes

OpenClaw Mock API for Building Tools/Integrations Against the Gateway