53 Commands for Desktop Automation via OS Accessibility Trees

Agent-desktop is a native desktop automation CLI built with Rust, designed for AI agents that need to control desktop applications programmatically. Instead of the common screenshot-based approach (take screenshot, predict pixel coordinates, click, repeat), it interacts through operating system accessibility trees — the same structured data screen readers use. This means the model sees element roles, names, hierarchy, and state directly, making interactions faster, cheaper, and more robust to UI shifts.

Key Features

Single Rust binary (~15 MB), no runtime dependencies
53 commands covering observation, interaction, keyboard, mouse, notifications, clipboard, and window management
JSON output — machine-readable with error codes and recovery hints
Accessibility-first activation chain: uses pure accessibility API strategies before falling back to mouse events
Deterministic element references (e.g., @e1, @e2) with optimistic re-identification across UI shifts
Progressive skeleton traversal: shallow tree first (depth ~3), annotated with children_count, then drill-down into specific regions
Support for windows, menus, sheets, popovers, alerts, and notifications
Special handling for Chromium/Electron accessibility trees to reduce noise
C ABI via cdylib — can be loaded directly from Python, Swift, Go, Node, Ruby, or C without shelling out per command

Typical Workflow

For dense apps like Slack or VS Code, use progressive skeleton traversal to minimize token usage:

# 1. Shallow overview — depth-3 map, truncated containers show children_count agent-desktop snapshot --skeleton --app Slack -i --compact 2. Drill into a region of interest (named containers get refs) agent-desktop snapshot --root @e3 -i --compact 3. Act on an element found in the drill-down agent-desktop click @e12 4. Re-drill the same region to verify state change

agent-desktop snapshot --root @e3 -i --compact

For simpler apps, a full snapshot works fine: agent-desktop snapshot --app Finder -i.

Installation

npm install -g agent-desktop
# Or use npx: npx agent-desktop snapshot --app Finder -i
# From source: cargo build --release

Performance Stats

In practice, the progressive skeleton approach reduced token usage by 78% to 96% compared to full-tree dumps in Electron apps like Slack, VS Code, and Notion. For example, Slack's full accessibility tree can exceed 50,000 tokens — impractical for most LLM contexts.

Who It's For

Developers building desktop agents, internal automation tools, or research prototypes who want to avoid the cost and fragility of screenshot-based control loops.

📖 Read the full source: HN AI Agents