OpenClaw Benchmark Shows Qwen3.5:27B Outperforms Other Local LLMs for Agent Tasks

Benchmark Setup and Results
A user tested 7 local models on 22 real agent tasks using OpenClaw on a Raspberry Pi 5 with an RTX 3090 running Ollama. The tasks included reading emails, scheduling meetings, creating tasks, detecting phishing, handling errors, and browser automation.
The winner by a massive margin was qwen3.5:27b-q4_K_M at 59.4%. The runner-up (qwen3.5:35b) scored only 23.2%. All other models scored below 5%.
Key Findings
- The quantized 27B model beat the larger 35B version by 2.5x
- A 30B model scored dead last at 1.6%
- Medium thinking worked best - too much thinking actually hurt performance
- Zero models could complete browser automation tasks
- The main differentiator between winners and losers was whether the model could find and use command line tools
- Most models couldn't even find basic tools like the email function
This benchmark provides concrete data on how different local LLMs perform as AI agents in practical scenarios. The significant performance gap between the top model and others suggests tool-finding capability is a critical bottleneck for local LLM agents.
📖 Read the full source: r/LocalLLaMA
👀 See Also

BetterClaw vs OpenClaw: Comparing Tool Calling, Structured Outputs, and Workflow Control
A developer-focused comparison of BetterClaw and OpenClaw covering tool calling, structured outputs, workflow control, and day-to-day agent development.

Distillery: A Claude Code Plugin for Persistent Team Context
Distillery is a plugin for Claude Code that provides teams with shared, persistent context across sessions and people. Version 0.2.0 adds hybrid search, auth audit logging, and uv support.

Reflect MCP Server Implements Reflexion Paper for Persistent Coding Agent Memory
A developer implemented the Reflexion paper (Shinn et al., NeurIPS 2023) as an MCP server to give local coding agents persistent memory of their mistakes. The system uses regex-based pattern matching on error messages and stores lessons in SQLite with FTS5.

Claude Code Mastery: Open-source config system adds persistent memory and curated skills to Claude Code CLI
Claude Code Mastery is an open-source configuration system that adds persistent memory across sessions, smart lifecycle hooks, and 26+ curated skills to Claude Code CLI. It includes a 6-file Memory Bank per project, zero-config launcher, and cross-platform support.