SWE-rebench-V2 Released: Largest Open Dataset for Code Agent Training

SWE-rebench-V2 Release Details

Nebius's R&D team, led by Ibragim, has published SWE-rebench-V2, which they describe as "currently the biggest open dataset in the world for training coding agents." The dataset is multilingual and executable, designed specifically for large-scale reinforcement learning training.

Key Technical Features

The team built an automated pipeline to extract RL environments at scale. This release includes:

The complete SWE-rebench-V2 dataset
A detailed technical report
Paper and dataset available at: https://huggingface.co/papers/2602.23866

Community and Support

The team maintains active Discord support for both the dataset and their SWE-rebench Leaderboard at: https://discord.gg/wXYmWpMu. They note that the LocalLLaMA community has provided "the most valuable feedback" for their work with the SWE-rebench Leaderboard and confirm they're continuing work on the leaderboard with plans to "make it even cooler."

For research collaborations or questions, Ibragim can be reached via DM on Reddit or Twitter (X) at: https://x.com/ibragim_bad.

📖 Read the full source: r/LocalLLaMA