SWE-rebench Leaderboard Update: February 2026 Results Show Tight Competition

SWE-rebench February 2026 Results
The SWE-rebench leaderboard has been updated with February 2026 runs on 57 fresh GitHub PR tasks. The setup follows standard SWE-bench methodology: models read real PR issues, edit code, run tests, and must make the full test suite pass. Tasks are restricted to PRs created in the previous month.
Key Results
- Claude Opus 4.6 remains at the top with 65.3% resolved rate, continuing to set the pace with strong pass@5 (~70%)
- The top tier is extremely tight: gpt-5.2-medium (64.4%), GLM-5 (62.8%), and gpt-5.4-medium (62.8%) are all within a few points of the leader
- Gemini 3.1 Pro Preview (62.3%) and DeepSeek-V3.2 (60.9%) complete a tightly packed top-6
- Open-weight/hybrid models keep improving: Qwen3.5-397B (59.9%), Step-3.5-Flash (59.6%), and Qwen3-Coder-Next (54.4%) are closing the gap, driven by improved long-context use and scaling
- MiniMax M2.5 (54.6%) continues to stand out as a cost-efficient option with competitive performance
Overall, February shows a highly competitive frontier with multiple models within a few points of the lead.
📖 Read the full source: r/LocalLLaMA
👀 See Also

MeshCore team splits: trademark filed in secret, AI-generated code dispute
The MeshCore development team publicly splits after contributor Andy Kirby secretly filed for the MeshCore trademark and used Claude Code to generate the majority of his code contributions without disclosure.
Transformer Language Model Runs Locally on Stock Game Boy Color
Andrej Karpathy's TinyStories-260K model runs on a stock Game Boy Color via a custom ROM, using INT8 fixed-point math and bank-switched cartridge memory for weights and KV cache.

MLX Inference Performance Update: April 2026 Benchmarks and Features
MLX inference performance has improved significantly, with Qwen3.5-35B-A3B reaching 71.8 tokens/second at 4K context and new features like Multi-Token Prediction and SpecPrefill providing 2.3x-5.5x speedups for large models.

Analysis of 'Clausage': User Anxiety Patterns in AI Subscription Models
A user analysis identifies 'Clausage' or 'The Claude Syndrome'—behavioral patterns where premium AI subscribers experience chronic usage anxiety, avoidance behavior, and compulsive resource monitoring. The source details specific symptoms like anticipatory avoidance, usage hypervigilance, and paradoxical underutilization of paid services.