LumaBrowser: Offload DOM Parsing to Local LLMs for AI Agents

What LumaBrowser Does

LumaBrowser is an Electron-based browser built specifically for autonomous AI agents that need to interact with web pages. The core problem it solves: agents were previously forced to process megabytes of raw HTML just to find simple UI elements like login buttons, wasting valuable context window space and computational resources.

How It Works

The browser connects to any OpenAI-compatible endpoint (the creator uses LM Studio) to handle DOM parsing. When an agent needs to interact with a page element, the local model analyzes the DOM structure, identifies the target element (like "the login button"), and returns the appropriate CSS selector. This keeps the main agent models focused on their actual tasks instead of parsing HTML.

Technical Implementation

Architecture: Electron browser with MCP server over stdio and REST API
Model Integration: Works with any OpenAI-compatible endpoint
Model Used: Creator reports using Qwen 2.5 variants, specifically 35B-A3B through LM Studio
Sharing Mechanism: When an LLM successfully resolves a selector, it shares an anonymized mapping to a public database to improve fallback performance over time
Experimental Feature: WebGPU mode to run small models directly in the browser (creator notes results are "hit or miss so far")

Creator's Use Case

The developer runs autonomous agents on a 5090/3090 setup doing scheduled tasks. Browser access was previously the weakest link because agents had to process entire HTML documents just to find simple elements. With LumaBrowser, the DOM parsing is offloaded to specialized models, while the main agents stay focused on higher-level task logic.