Gemma Gem: On-Device AI Agent for Browser Automation via WebGPU

Gemma Gem is a Chrome extension that loads Google's Gemma 4 model (2B or 4B) through WebGPU in an offscreen document, giving it tools to interact with webpages directly in the browser without external API calls.
Key Details
The extension provides several tools that run in different contexts:
read_page_content: Read text/HTML of the page or a CSS selector (Content script)take_screenshot: Capture visible page as PNG (Service worker)click_element: Click an element by CSS selector (Content script)type_text: Type into an input by CSS selector (Content script)scroll_page: Scroll up/down by pixel amount (Content script)run_javascript: Execute JS in the page context with full DOM access (Service worker)
The architecture uses three main components:
- Offscreen document: Hosts the model via @huggingface/transformers + WebGPU, runs the agent loop
- Service worker: Routes messages between content scripts and offscreen document, handles take_screenshot and run_javascript
- Content script: Injects gem icon + shadow DOM chat overlay, executes DOM tools
Setup and Usage
Requirements:
- Chrome with WebGPU support
- ~500MB disk for E2B model, ~1.5GB for E4B (cached after first run)
Setup commands:
pnpm install
pnpm build
Load the extension in chrome://extensions (developer mode) from .output/chrome-mv3-dev/.
Usage:
- Navigate to any page
- Click the gem icon (bottom-right corner) to open the chat
- Wait for model to load (progress shown on icon + chat)
- Ask questions about the page or request actions
Settings and Configuration
Available settings via gear icon in chat header:
- Model: Switch between Gemma 4 E2B (~500MB) and E4B (~1.5GB) - selection persists across sessions
- Thinking: Toggle native Gemma 4 thinking
- Max iterations: Cap on tool call loops per request
- Clear context: Reset conversation history for the current page
- Disable on this site: Disable the extension per-hostname (persisted)
Development and Debugging
Tech stack:
- WXT — Chrome extension framework (Vite-based)
- @huggingface/transformers — Browser ML inference
- marked — Markdown rendering in chat
- Gemma 4 E2B / E4B (onnx-community/gemma-4-E2B-it-ONNX, onnx-community/gemma-4-E4B-it-ONNX) — q4f16 quantization, 128K context
Build commands:
pnpm build # Development build (with logging, source maps)
pnpm build:prod # Production build (logging silenced, minified)
Debugging locations:
- Service worker logs: chrome://extensions → Gemma Gem → "Inspect views: service worker"
- Offscreen document logs: chrome://extensions → Gemma Gem → "Inspect views: offscreen.html"
- Content script logs: Open DevTools on any page → Console
- All extension pages: chrome://inspect#other lists all inspectable extension contexts
The offscreen document logs show model loading, prompt construction, token counts, raw model output, and tool execution.
Technical Notes
The agent/ directory has zero dependencies and defines interfaces (ModelBackend, ToolExecutor) that can be extracted as a standalone library. The extension includes a thinking mode that shows chain-of-thought reasoning as it works.
According to the source, the agent works for simple page questions and running JavaScript, but multi-step tool chains are unreliable and it sometimes ignores its tools entirely.
📖 Read the full source: HN AI Agents
👀 See Also

Skill Scaffolder: Build OpenClaw Skills Without Writing Code
Skill Scaffolder is an open-source tool that lets users create OpenClaw skills by describing what they want in plain English. It handles the entire process—interviewing users, writing skill files, testing, and installation—without requiring YAML, Python, or config files.

ClawControl 1.7.1 improves message reliability and media support for OpenClaw
ClawControl 1.7.1 fixes several client-side issues including runaway text accumulation, ghost messages, and media handling problems. The update maintains compatibility with OpenClaw through version 3.28.

ClawsifyAI Agent Handles Email, Research, and Brainstorming Tasks
A developer tested ClawsifyAI, an AI agent-style claw bot, for a week and found it handled emails, research, repetitive work, and brainstorming. The agent provides clear feedback, practical solutions, and sometimes better ideas than originally planned.

Architect CLI: Open-source tool for orchestrating headless AI coding agents in CI/CD
Architect is an open-source CLI tool designed for autonomous AI coding agents in CI/CD pipelines, featuring the Ralph Loop for test-retry cycles, deterministic guardrails, YAML pipeline definitions, and support for multiple LLMs via LiteLLM.