PinchBench Results: First OpenClaw-Specific AI Coding Agent Benchmark

✍️ OpenClawRadar📅 Published: March 8, 2026🔗 Source

PinchBench is the first benchmark specifically designed for evaluating AI coding agents in the OpenClaw ecosystem, ranking models by success rate, cost, and speed.

Key Results

The benchmark tested 32 models. Top performers by success rate:

1. google/gemini-3-flash-preview: 95.1% success, $0.72 cost, 254.50s speed
2. minimax/minimax-m2.1: 93.6% success, $0.14 cost, 239.79s speed
3. moonshotai/kimi-k2.5: 93.4% success, $0.20 cost, 291.67s speed
4. anthropic/claude-sonnet-4.5: 92.7% success, $3.07 cost, 304.53s speed
5. google/gemini-3-pro-preview: 91.7% success, $1.48 cost, 239.55s speed

Notable Findings

Flash models beat Pro models at lower cost: Gemini-3-Flash-Preview (95.1%, $0.72) outperforms Gemini-3-Pro-Preview (91.7%, $1.48)
More expensive models don't necessarily perform better
Minimax 2.5 ranked 31st with 35.5% success rate, 105.96s speed (cost not listed)
Several models show high success rates above 90% while keeping costs under $1

Performance Range

Success rates range from 95.1% (top) to 35.2% (bottom). Cost-effective options include:

openai/gpt-5-nano: 85.8% success for $0.03
google/gemini-2.5-flash-lite: 83.2% success for $0.05
mistralai/devstral-2512: 81.7% success for $0.10

Several models at the bottom of the ranking (positions 23-32) show success rates around 40% or lower, with costs not listed in the provided data.

📖 Read the full source: r/openclaw

👀 See Also

Tools

Context Mode MCP Server Cuts Claude Code Context Usage by 98%

Context Mode is an MCP server that reduces Claude Code context consumption from 315 KB to 5.4 KB by sandboxing tool outputs. It supports 10 language runtimes and includes a knowledge base with full-text search.

Feb 28, 2026, 05:45 PM UTC

OpenClawRadar

Tools

AlphaCreek: An MCP Server That Chunks SEC Filings to Cut Token Usage by 85%

AlphaCreek is a free MCP connector for Claude that reduces token consumption by ~85% when working with SEC filings by first returning a table of contents, then fetching only the sections the agent requests.

Apr 30, 2026, 06:20 PM UTC

OpenClawRadar

Tools

ELBO Platform: AI-Powered Training for Critical Thinking and Communication Skills

ELBO is a live training platform built with Claude Code that uses AI to help users practice critical thinking, persuasion, negotiation, and public speaking skills through simulated scenarios and debates.

Apr 15, 2026, 10:45 AM UTC

OpenClawRadar

Tools

Brand-Docs: Open-Source Tool for Claude to Generate DOCX, PPTX, XLSX from Templates

A Reddit user open-sourced Brand-Docs, a solution for Claude to generate Office documents (DOCX, PPTX, XLSX) that faithfully preserve company brand templates — layout, styles, images — without recreation.

Jun 21, 2026, 12:16 PM UTC

OpenClawRadar