Gemma Gem: On-Device AI Agent for Browser Automation via WebGPU

✍️ OpenClawRadar📅 Published: April 16, 2026🔗 Source
Gemma Gem: On-Device AI Agent for Browser Automation via WebGPU
Ad

Gemma Gem is a Chrome extension that loads Google's Gemma 4 model (2B or 4B) through WebGPU in an offscreen document, giving it tools to interact with webpages directly in the browser without external API calls.

Key Details

The extension provides several tools that run in different contexts:

  • read_page_content: Read text/HTML of the page or a CSS selector (Content script)
  • take_screenshot: Capture visible page as PNG (Service worker)
  • click_element: Click an element by CSS selector (Content script)
  • type_text: Type into an input by CSS selector (Content script)
  • scroll_page: Scroll up/down by pixel amount (Content script)
  • run_javascript: Execute JS in the page context with full DOM access (Service worker)

The architecture uses three main components:

  • Offscreen document: Hosts the model via @huggingface/transformers + WebGPU, runs the agent loop
  • Service worker: Routes messages between content scripts and offscreen document, handles take_screenshot and run_javascript
  • Content script: Injects gem icon + shadow DOM chat overlay, executes DOM tools

Setup and Usage

Requirements:

  • Chrome with WebGPU support
  • ~500MB disk for E2B model, ~1.5GB for E4B (cached after first run)

Setup commands:

pnpm install
pnpm build

Load the extension in chrome://extensions (developer mode) from .output/chrome-mv3-dev/.

Usage:

  1. Navigate to any page
  2. Click the gem icon (bottom-right corner) to open the chat
  3. Wait for model to load (progress shown on icon + chat)
  4. Ask questions about the page or request actions
Ad

Settings and Configuration

Available settings via gear icon in chat header:

  • Model: Switch between Gemma 4 E2B (~500MB) and E4B (~1.5GB) - selection persists across sessions
  • Thinking: Toggle native Gemma 4 thinking
  • Max iterations: Cap on tool call loops per request
  • Clear context: Reset conversation history for the current page
  • Disable on this site: Disable the extension per-hostname (persisted)

Development and Debugging

Tech stack:

  • WXT — Chrome extension framework (Vite-based)
  • @huggingface/transformers — Browser ML inference
  • marked — Markdown rendering in chat
  • Gemma 4 E2B / E4B (onnx-community/gemma-4-E2B-it-ONNX, onnx-community/gemma-4-E4B-it-ONNX) — q4f16 quantization, 128K context

Build commands:

pnpm build        # Development build (with logging, source maps)
pnpm build:prod   # Production build (logging silenced, minified)

Debugging locations:

  • Service worker logs: chrome://extensions → Gemma Gem → "Inspect views: service worker"
  • Offscreen document logs: chrome://extensions → Gemma Gem → "Inspect views: offscreen.html"
  • Content script logs: Open DevTools on any page → Console
  • All extension pages: chrome://inspect#other lists all inspectable extension contexts

The offscreen document logs show model loading, prompt construction, token counts, raw model output, and tool execution.

Technical Notes

The agent/ directory has zero dependencies and defines interfaces (ModelBackend, ToolExecutor) that can be extracted as a standalone library. The extension includes a thinking mode that shows chain-of-thought reasoning as it works.

According to the source, the agent works for simple page questions and running JavaScript, but multi-step tool chains are unreliable and it sometimes ignores its tools entirely.

📖 Read the full source: HN AI Agents

Ad

👀 See Also