PinchBench Results: First OpenClaw-Specific AI Coding Agent Benchmark

✍️ OpenClawRadar📅 Published: March 8, 2026🔗 Source
PinchBench Results: First OpenClaw-Specific AI Coding Agent Benchmark
Ad

PinchBench is the first benchmark specifically designed for evaluating AI coding agents in the OpenClaw ecosystem, ranking models by success rate, cost, and speed.

Key Results

The benchmark tested 32 models. Top performers by success rate:

  • 1. google/gemini-3-flash-preview: 95.1% success, $0.72 cost, 254.50s speed
  • 2. minimax/minimax-m2.1: 93.6% success, $0.14 cost, 239.79s speed
  • 3. moonshotai/kimi-k2.5: 93.4% success, $0.20 cost, 291.67s speed
  • 4. anthropic/claude-sonnet-4.5: 92.7% success, $3.07 cost, 304.53s speed
  • 5. google/gemini-3-pro-preview: 91.7% success, $1.48 cost, 239.55s speed
Ad

Notable Findings

  • Flash models beat Pro models at lower cost: Gemini-3-Flash-Preview (95.1%, $0.72) outperforms Gemini-3-Pro-Preview (91.7%, $1.48)
  • More expensive models don't necessarily perform better
  • Minimax 2.5 ranked 31st with 35.5% success rate, 105.96s speed (cost not listed)
  • Several models show high success rates above 90% while keeping costs under $1

Performance Range

Success rates range from 95.1% (top) to 35.2% (bottom). Cost-effective options include:

  • openai/gpt-5-nano: 85.8% success for $0.03
  • google/gemini-2.5-flash-lite: 83.2% success for $0.05
  • mistralai/devstral-2512: 81.7% success for $0.10

Several models at the bottom of the ranking (positions 23-32) show success rates around 40% or lower, with costs not listed in the provided data.

📖 Read the full source: r/openclaw

Ad

👀 See Also