GLM 5 on Mac M3: Performance for Agentic Coding

Performance Benchmarks and Limitations

A developer tested GLM 5 using MLX 4-bit quantization on a Mac M3 with 512GB RAM for agentic coding tasks. The model is described as "quite usable" with context kept below approximately 50,000 tokens, though significantly slower than API-based solutions like Claude, particularly during prompt processing.

Performance degrades substantially when context exceeds 50k tokens. In one test processing 65k tokens, the first half completed in 8 minutes (67 tokens/second), while the second half took 18 additional minutes, resulting in an overall rate of 41 tokens/second. Token generation remains faster, estimated at 12-20 tokens/second at larger context sizes.

Workflow Observations

The user notes that Opencode (the agentic coding system) handles multi-file code generation efficiently once a plan is created, outputting "thousands of tokens of code across multiple files in just a few minutes with reasoning in between." Prompt processing typically takes "a couple minutes" to read a few hundred lines of code per file, with about 10 minutes total spread across planning sessions.

Compaction in Opencode "does take a while as it likes to basically just reprocess the whole context." With a 50k token context limit, compaction takes approximately 5 minutes.

Technical Setup and Future Expectations

The test was conducted using LM Studio, which may not provide the latest runtime optimizations. The user suggests that "MLX or even GGUF may get faster prompt processing as the runtimes are updated for GLM 5, but it will likely not get a TON faster than this."

The setup is not recommended for tasks requiring 70k+ tokens in context due to both context size limitations and "unbearable slowness" that occurs after exceeding certain thresholds during prompt processing.

📖 Read the full source: r/LocalLLaMA

GLM 5 on Mac M3: Performance Observations for Agentic Coding

Performance Benchmarks and Limitations

Workflow Observations

Technical Setup and Future Expectations

👀 See Also

Clawdbot Unleashes New Features with Pro Subscription

YouTube Transcript MCP Improves Claude Research Workflow

Benchmark Results: 6 Low-Cost Models vs. Claude Sonnet 4.6 for OpenClaw Orchestration

GSD-Lite: A State Machine for Claude Code That Enforces TDD and Prevents Test Skipping