Gemma-4 26B-A4B with Opencode Runs Efficiently on M5 MacBook Air

✍️ OpenClawRadar📅 Published: April 14, 2026🔗 Source
Gemma-4 26B-A4B with Opencode Runs Efficiently on M5 MacBook Air
Ad

A developer tested Gemma-4-26B-A4B with Opencode on a 32GB M5 MacBook Air and found it delivers practical performance for local AI coding tasks.

Performance Benchmarks

The specific configuration tested was gemma-4-26B-A4B-it-UD-IQ4_XS running on a 32GB M5 MacBook Air. In low power mode, it achieved:

  • 300 tokens/second prompt processing
  • 12 tokens/second generation
  • 8W power consumption
  • No heat or fan noise during operation

The M5 MacBook Air showed significant improvements over previous hardware:

  • ~25% faster prompt processing than an M1 Max 64GB (even when the Max wasn't in power saving mode)
  • ~6 hours of battery life versus ~2 hours on the M1 Max when running Opencode
  • This despite having a smaller battery (53.8Wh vs 70Wh on the M1 Max)

Practical Use Cases

The developer found this setup "actually usable" for agentic coding behavior from a laptop. Previously, running LLMs on an M1 Max 64GB was limited to "tinkering and toy use cases" and couldn't handle longer context tasks effectively. While it could create a simple Snake game in Python, agentic coding or contributing to larger codebases was "a bit janky."

The M5's performance makes it practical for mobile use cases where internet connectivity might be unreliable, such as coffee shops or train commutes.

Ad

Comparison to Other Models

The developer compared Gemma-4-26B with Opencode to closed-source alternatives:

  • It doesn't replace Claude Code or Antigravity from their testing
  • Gemma-4 requires "far more hand-holding than current closed-source frontier models"
  • The responses are described as "kinda dry" compared to Claude Code or Gemini-3.1-Pro with Antigravity
  • However, they'd prefer Gemma-4-26B over running out of Gemini-2.5-Pro allowance and being forced to use Gemini-2.5-Flash

The developer notes this represents significant progress, as "this sort of agentic coding was cutting-edge / not even really possible with frontier models back at the end of 2024."

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also