Benchmark shows AI browser automation tools vary 2.6x in token costs despite identical accuracy

Benchmark results: Same accuracy, different costs
A benchmark tested 4 CLI browser automation tools using the same model (Claude Sonnet 4.6) on 6 real-world tasks against live websites. All tools scored 100% accuracy across 18 task executions, but token usage varied dramatically:
- openbrowser-ai: 36,010 tokens / 84.8s / 15.3 tool calls
- browser-use: 77,123 tokens / 106.0s / 20.7 tool calls
- playwright-cli (Microsoft): 94,130 tokens / 118.3s / 25.7 tool calls
- agent-browser (Vercel): 90,107 tokens / 99.0s / 25.0 tool calls
Openbrowser-ai used 2.1 to 2.6x fewer tokens than the other tools. The benchmark found tool call count is the strongest predictor of token cost because every call forces the LLM to re-process the entire conversation history.
How the tools differ in implementation
All four tools maintain persistent browser sessions via background daemons, can execute JavaScript server-side and return just the result, work on making page state compact, and support some form of code execution.
browser-use exposes individual CLI commands: open, click, input, scroll, state, eval. The LLM issues one command per tool call. eval runs JavaScript in the page context. Page state is an enhanced DOM tree with [N] indices at roughly 880 characters per page. It communicates with Chrome via direct CDP through their cdp-use library.
agent-browser follows a similar pattern: open, click, fill, snapshot, eval. It's a native Rust binary that talks CDP directly to Chrome. Page state is an accessibility tree with u/eN refs. The -i flag produces compact interactive-only output at around 590 characters. Commands can be chained with && but each is still a separate daemon request.
playwright-cli offers individual commands plus run-code, which accepts arbitrary Playwright JavaScript with full API access. The LLM can write code like run-code "async page => { await page.goto('url'); await page.click('.btn'); return await page.title(); }" and execute multiple operations in one call. Page state is an accessibility tree saved to .yml files at roughly 1,420 characters, with incremental snapshots that send only diffs after the first read.
openbrowser-ai has no individual commands at all. The only interface is Python code via -c:
openbrowser-ai -c 'await navigate("https://en.wikipedia.org/wiki/Python") info = await evaluate("document.querySelector('.infobox')?.innerText") print(info)'navigate, click, input_text, evaluate, scroll are async Python functions in a persistent namespace. The page state is DOM with [i_N] indices at roughly 450 characters. Variables persist across calls like a Jupyter notebook.
The benchmark observed that the LLM made fewer tool calls with OpenBrowser (15.3 vs 20-26 for other tools), which the authors attribute to the code-only interface naturally encouraging batching of operations.
📖 Read the full source: r/ClaudeAI
👀 See Also

CLAUDE.md: Drop-in file reduces Claude output tokens by 63%
CLAUDE.md is a single file that cuts Claude output verbosity by approximately 63% without code changes. It targets sycophancy, verbosity, and formatting noise in Claude's responses.

No-Code Persistent Memory System for Claude Using Notion and MCP
A radiologist built a 'Cognitive Hub' in Notion that Claude reads and writes to through MCP, creating a structured knowledge base with a routing table to load only relevant information per conversation. The system has grown to 70+ pages after a month of daily use.

AutoBe: How Weak Local LLMs Fixed an AI Backend Generator's Architecture
AutoBe is an open-source AI agent that generates complete backend apps using TypeScript, NestJS, and Prisma. The team discovered their initial 100% compilation success produced unmaintainable code, then rebuilt with modular generation—crashing success to 40%—and used weak local LLMs like qwen3-30b-a3b-thinking to debug schema ambiguities.

Caliby: Open-Source Embedded Vector Database for AI Agents with Hybrid Text+Vector Storage
Caliby is a C++ embedded vector database with Python bindings (pip install caliby) that supports HNSW, DiskANN, and IVF+PQ indexes, claims 4x performance over pgvector, and natively stores text alongside vectors for AI Agent/RAG use cases.