Two Research Projects Challenge Imitation Learning for Web Agents

✍️ OpenClawRadar📅 Published: April 13, 2026🔗 Source
Two Research Projects Challenge Imitation Learning for Web Agents
Ad

Two Approaches to Web Agent Training

Two research projects challenge the standard approach of training AI agents solely through imitation of expert demonstrations, focusing specifically on web form filling tasks where models must navigate real websites, fill fields, click buttons, and submit forms.

Browser in the Loop: RL for Task Completion

The first project, "Browser in the Loop" (doi.org/10.13140/RG.2.2.24922.71360), uses an 8-billion-parameter model in a feedback loop with a real browser. Instead of only imitating expert demonstrations, the model generates action plans, executes them against live web forms, and learns from the outcome.

Reinforcement learning converts near-perfect attempts (where all fields are correct but submission fails) into actual successes. The gains come not from filling fields better, but from learning to cross the finish line—something imitation alone never optimized for.

Ad

Concentrate or Collapse: RL Challenges with Diffusion Models

The second project, "Concentrate or Collapse" (doi.org/10.13140/RG.2.2.11500.94088), explores what happens when models don't generate actions left to right at all. Diffusion language models refine entire action sequences in parallel, but applying the same RL that works for autoregressive models causes these diffusion models to collapse, with outputs degrading to incoherence.

Across 16 controlled comparisons, token-level RL improved only twice. The fix required rethinking optimization at the sequence level, where one method (ESPO) finally broke through for pure diffusion architectures.

Key Implications

The research highlights that most web agent benchmarks still evaluate on text similarity to reference trajectories rather than actual task completion. These projects suggest that what looks correct on paper and what actually works in a browser are different problems, and optimizing for the wrong one leaves performance on the table.

All 12 trained models and their pipeline have been open-sourced: Code at github.com/billy-enrizky/openbrowser-ai and models at huggingface.co/billyenrizky.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also