Gemma-4 26B-A4B with Opencode Runs Efficiently on M5 MacBook Air

A developer tested Gemma-4-26B-A4B with Opencode on a 32GB M5 MacBook Air and found it delivers practical performance for local AI coding tasks.
Performance Benchmarks
The specific configuration tested was gemma-4-26B-A4B-it-UD-IQ4_XS running on a 32GB M5 MacBook Air. In low power mode, it achieved:
- 300 tokens/second prompt processing
- 12 tokens/second generation
- 8W power consumption
- No heat or fan noise during operation
The M5 MacBook Air showed significant improvements over previous hardware:
- ~25% faster prompt processing than an M1 Max 64GB (even when the Max wasn't in power saving mode)
- ~6 hours of battery life versus ~2 hours on the M1 Max when running Opencode
- This despite having a smaller battery (53.8Wh vs 70Wh on the M1 Max)
Practical Use Cases
The developer found this setup "actually usable" for agentic coding behavior from a laptop. Previously, running LLMs on an M1 Max 64GB was limited to "tinkering and toy use cases" and couldn't handle longer context tasks effectively. While it could create a simple Snake game in Python, agentic coding or contributing to larger codebases was "a bit janky."
The M5's performance makes it practical for mobile use cases where internet connectivity might be unreliable, such as coffee shops or train commutes.
Comparison to Other Models
The developer compared Gemma-4-26B with Opencode to closed-source alternatives:
- It doesn't replace Claude Code or Antigravity from their testing
- Gemma-4 requires "far more hand-holding than current closed-source frontier models"
- The responses are described as "kinda dry" compared to Claude Code or Gemini-3.1-Pro with Antigravity
- However, they'd prefer Gemma-4-26B over running out of Gemini-2.5-Pro allowance and being forced to use Gemini-2.5-Flash
The developer notes this represents significant progress, as "this sort of agentic coding was cutting-edge / not even really possible with frontier models back at the end of 2024."
📖 Read the full source: r/LocalLLaMA
👀 See Also

No-Code Persistent Memory System for Claude Using Notion and MCP
A radiologist built a 'Cognitive Hub' in Notion that Claude reads and writes to through MCP, creating a structured knowledge base with a routing table to load only relevant information per conversation. The system has grown to 70+ pages after a month of daily use.

Building an Autonomous Research Agent with C# and Local LLMs
A C# research agent automates URL processing with local LLMs using Ollama and llama3.1:8b, generating structured markdown reports from web searches.

MemAware benchmark shows RAG-based agent memory fails on implicit context retrieval
The MemAware benchmark tests whether AI agents can surface relevant past context when users don't explicitly ask for it, revealing that current memory systems score only 2.8% accuracy on hard implicit queries versus 0.8% with no memory.

Security scanning skill for AI coding agents checks deployments automatically
A developer created a skill file that enables AI coding agents to automatically scan their own deployments for exposed .env files, open ports, missing security headers, and leaked source code. The scan runs after every deploy and takes about 30 seconds.