Local Qwen Models Achieve Browser Automation with Stepwise Planning and Compact DOM

✍️ OpenClawRadar📅 Published: March 17, 2026🔗 Source

Stepwise Planning Overcomes Upfront Planning Failures

The developer discovered that asking models to invent a full multi-step plan before seeing the real page state works on familiar sites but breaks quickly on unexpected elements. What worked better was stepwise planning where the model replans from the current DOM snapshot at each step.

Example Flow on Ace Hardware

The tested flow with Qwen 8B as planner and 4B as executor on Ace Hardware (a site the model had no prior task for) completed a full cart flow with zero vision model usage. The stepwise approach looked like this:

Step 1: see search box → TYPE "grass mower"
Step 2: see results → CLICK Add to Cart
Step 3: drawer appears → dismiss it
Step 4: cart visible → CLICK View Cart
Step 5: DONE

Compact DOM Representation Enables Small Models

The model never sees raw HTML or screenshots—just a semantic table representation:

id|role|text|importance|bg|clickable|nearby_text
665|button|Proceed to checkout|675|orange|1|
761|button|Add to cart|720|yellow|1|$299.99
1488|link|ThinkPad E16|478|none|1|Laptop 16"

This allows the 4B executor to pick an element ID from a short list. Vision approaches burn 2-3K tokens per screenshot, easily 50-100K+ for a full flow, while compact snapshots use ~15K total for the same task.

Modal Handling Critical for Success

After each click, if the DOM suddenly grows, the agent scans for dismiss patterns (close, ×, no thanks, etc.) before planning again. This fixed many failures that appeared to be "bad reasoning" but were actually hidden overlays.

The developer notes being curious if others are seeing stepwise planning beat upfront planning once sites get unfamiliar.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

CloudRouter Empowers AI Coding Agents with VM and GPU Management

CloudRouter introduces a CLI tool that allows AI coding agents to autonomously spin up cloud VMs and GPUs, automating tasks like browser verification and GPU-intensive workloads.

Feb 13, 2026, 09:45 PM UTC

OpenClawRadar

Tools

OpenClaw Developer Achieves AI Agent Breakthroughs with Uber and Restaurant Booking Automation

An OpenClaw developer has successfully created AI agents that autonomously complete Uber ride bookings and restaurant reservations on real websites, overcoming bot detection and CAPTCHAs using a stack with stealth browsers, residential proxies, and CAPTCHA solving.

Mar 21, 2026, 12:45 AM UTC

OpenClawRadar

Tools

OpenObscure: Open-Source On-Device Privacy Firewall for AI Agents

OpenObscure is an open-source, on-device privacy firewall that sits between AI agents and LLM providers, using FF1 Format-Preserving Encryption to encrypt PII values before requests leave your device. It includes PII detection with 99.7% recall, cognitive firewall scanning, and runs on macOS/Linux/Windows with iOS/Android bindings.

Mar 28, 2026, 11:45 AM UTC

OpenClawRadar

Tools

Code Evolution Method Triples LLM Performance on ARC-AGI-2 Benchmark

Researchers achieved a 2.8x improvement on the ARC-AGI-2 benchmark using code evolution with open-weight models, reaching 34% accuracy at $2.67 per task. The same method pushed Gemini 3.1 Pro to 95% accuracy at $8.71 per task.

Feb 28, 2026, 01:45 AM UTC

OpenClawRadar