Two Research Projects Challenge Imitation Learning for Web Agents

Two Approaches to Web Agent Training
Two research projects challenge the standard approach of training AI agents solely through imitation of expert demonstrations, focusing specifically on web form filling tasks where models must navigate real websites, fill fields, click buttons, and submit forms.
Browser in the Loop: RL for Task Completion
The first project, "Browser in the Loop" (doi.org/10.13140/RG.2.2.24922.71360), uses an 8-billion-parameter model in a feedback loop with a real browser. Instead of only imitating expert demonstrations, the model generates action plans, executes them against live web forms, and learns from the outcome.
Reinforcement learning converts near-perfect attempts (where all fields are correct but submission fails) into actual successes. The gains come not from filling fields better, but from learning to cross the finish line—something imitation alone never optimized for.
Concentrate or Collapse: RL Challenges with Diffusion Models
The second project, "Concentrate or Collapse" (doi.org/10.13140/RG.2.2.11500.94088), explores what happens when models don't generate actions left to right at all. Diffusion language models refine entire action sequences in parallel, but applying the same RL that works for autoregressive models causes these diffusion models to collapse, with outputs degrading to incoherence.
Across 16 controlled comparisons, token-level RL improved only twice. The fix required rethinking optimization at the sequence level, where one method (ESPO) finally broke through for pure diffusion architectures.
Key Implications
The research highlights that most web agent benchmarks still evaluate on text similarity to reference trajectories rather than actual task completion. These projects suggest that what looks correct on paper and what actually works in a browser are different problems, and optimizing for the wrong one leaves performance on the table.
All 12 trained models and their pipeline have been open-sourced: Code at github.com/billy-enrizky/openbrowser-ai and models at huggingface.co/billyenrizky.
📖 Read the full source: r/LocalLLaMA
👀 See Also

ClawbBot Community Discusses Potential Interface Improvements
The ClawbBot community is actively exploring ideas for enhancing its interface, focusing on boosting user experience and functionality. The discussion ignites promising innovations in the realm of AI coding agents.

Qwen3.5-122B-A10B-MINT-MLX runs smoothly on M5 Pro with 64GB RAM
A user reports successful local deployment of the Qwen3.5-122B-A10B-MINT-MLX model on an M5 Pro with 64GB RAM, achieving 39.58 tokens/sec generation speed with specific VRAM allocation commands.

Lovable offers $100 free Claude API credits for International Women's Day
Lovable is giving away $100 in Anthropic Claude API credits, $250 in Stripe fee credits, and 24-hour free access to their platform through March 8. Users need to claim the offer before 12:59 AM ET on March 9.

Google donates Agent Payments Protocol (AP2) to FIDO Alliance, releases v0.2 with 'Human Not Present' payments
Google is donating the Agent Payments Protocol (AP2) to the FIDO Alliance, and releasing v0.2 with support for autonomous 'Human Not Present' payments and a new Verifiable Intent standard co-developed with Mastercard.