Agent-Desktop: Structured Desktop Automation via OS Accessibility Trees

✍️ OpenClawRadar📅 Published: May 2, 2026🔗 Source
Agent-Desktop: Structured Desktop Automation via OS Accessibility Trees
Ad

Agent-desktop is a native desktop automation CLI built with Rust, designed for AI agents that need to control desktop applications programmatically. Instead of the common screenshot-based approach (take screenshot, predict pixel coordinates, click, repeat), it interacts through operating system accessibility trees — the same structured data screen readers use. This means the model sees element roles, names, hierarchy, and state directly, making interactions faster, cheaper, and more robust to UI shifts.

Key Features

  • Single Rust binary (~15 MB), no runtime dependencies
  • 53 commands covering observation, interaction, keyboard, mouse, notifications, clipboard, and window management
  • JSON output — machine-readable with error codes and recovery hints
  • Accessibility-first activation chain: uses pure accessibility API strategies before falling back to mouse events
  • Deterministic element references (e.g., @e1, @e2) with optimistic re-identification across UI shifts
  • Progressive skeleton traversal: shallow tree first (depth ~3), annotated with children_count, then drill-down into specific regions
  • Support for windows, menus, sheets, popovers, alerts, and notifications
  • Special handling for Chromium/Electron accessibility trees to reduce noise
  • C ABI via cdylib — can be loaded directly from Python, Swift, Go, Node, Ruby, or C without shelling out per command
Ad

Typical Workflow

For dense apps like Slack or VS Code, use progressive skeleton traversal to minimize token usage:

# 1. Shallow overview — depth-3 map, truncated containers show children_count
agent-desktop snapshot --skeleton --app Slack -i --compact

2. Drill into a region of interest (named containers get refs)

agent-desktop snapshot --root @e3 -i --compact

3. Act on an element found in the drill-down

agent-desktop click @e12

4. Re-drill the same region to verify state change

agent-desktop snapshot --root @e3 -i --compact

For simpler apps, a full snapshot works fine: agent-desktop snapshot --app Finder -i.

Installation

npm install -g agent-desktop
# Or use npx: npx agent-desktop snapshot --app Finder -i
# From source: cargo build --release

Performance Stats

In practice, the progressive skeleton approach reduced token usage by 78% to 96% compared to full-tree dumps in Electron apps like Slack, VS Code, and Notion. For example, Slack's full accessibility tree can exceed 50,000 tokens — impractical for most LLM contexts.

Who It's For

Developers building desktop agents, internal automation tools, or research prototypes who want to avoid the cost and fragility of screenshot-based control loops.

📖 Read the full source: HN AI Agents

Ad

👀 See Also

Single-page chatbot interface for locally running Gemma 4 26B A4B
Tools

Single-page chatbot interface for locally running Gemma 4 26B A4B

A developer built a single HTML page chatbot that connects to Gemma 4 26B A4B running locally with 32K context window at 50-65 tokens/second, sharded between a 7900 XT and 3060 Ti GPU. The interface includes full streaming, Markdown rendering, and parameter controls.

OpenClawRadar
Agent Skill Harbor: GitHub-native skill management for AI agent teams
Tools

Agent Skill Harbor: GitHub-native skill management for AI agent teams

Agent Skill Harbor is an open-source platform for teams to share, track, and govern AI agent skills using GitHub-native workflows. It collects skills from GitHub repos, tracks provenance, supports safety checks, and publishes a static catalog site with GitHub Actions and Pages.

OpenClawRadar
Agent Swarm: Multi-Agent Orchestration Framework for AI Coding Assistants
Tools

Agent Swarm: Multi-Agent Orchestration Framework for AI Coding Assistants

Agent Swarm is an open-source framework that enables teams of AI coding agents to coordinate autonomously. A lead agent receives tasks from Slack, GitHub, or email, breaks them down, and delegates to Docker-isolated worker agents.

OpenClawRadar
Why Deterministic Workflows Outperform AI-Driven Orchestration for Agent Systems
Tools

Why Deterministic Workflows Outperform AI-Driven Orchestration for Agent Systems

A developer with a year of experience building agent systems shares that AI-driven orchestration failed reliably due to non-deterministic routing, compounding errors, cost explosion, and impossible debugging. Switching to deterministic workflows with code-based orchestration eliminated orchestration failures.

OpenClawRadar