SWE-rebench Leaderboard Update: February 2026 Results Show Tight Competition

✍️ OpenClawRadar📅 Published: March 23, 2026🔗 Source
SWE-rebench Leaderboard Update: February 2026 Results Show Tight Competition
Ad

SWE-rebench February 2026 Results

The SWE-rebench leaderboard has been updated with February 2026 runs on 57 fresh GitHub PR tasks. The setup follows standard SWE-bench methodology: models read real PR issues, edit code, run tests, and must make the full test suite pass. Tasks are restricted to PRs created in the previous month.

Ad

Key Results

  • Claude Opus 4.6 remains at the top with 65.3% resolved rate, continuing to set the pace with strong pass@5 (~70%)
  • The top tier is extremely tight: gpt-5.2-medium (64.4%), GLM-5 (62.8%), and gpt-5.4-medium (62.8%) are all within a few points of the leader
  • Gemini 3.1 Pro Preview (62.3%) and DeepSeek-V3.2 (60.9%) complete a tightly packed top-6
  • Open-weight/hybrid models keep improving: Qwen3.5-397B (59.9%), Step-3.5-Flash (59.6%), and Qwen3-Coder-Next (54.4%) are closing the gap, driven by improved long-context use and scaling
  • MiniMax M2.5 (54.6%) continues to stand out as a cost-efficient option with competitive performance

Overall, February shows a highly competitive frontier with multiple models within a few points of the lead.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also