Understudy Desktop Agent: Learn Tasks by Demonstration

What Understudy Does

Understudy is a teachable desktop agent that operates your computer like a human colleague — handling GUI, browser, shell, file system, and messaging tools in one local runtime. The core innovation is teach-by-demonstration: you perform a task once, the agent records screen video plus semantic events, extracts the intent (not just coordinates), and turns it into a reusable skill.

Current Implementation Status

The system is designed as five layers, with current implementation status:

Layer 1 (Operate Software Natively): Implemented today on macOS. Operates any macOS desktop app using 13 tools + screenshot grounding + native input.
Layer 2 (Learn from Demonstrations): Implemented and usable today. User shows a task once — agent extracts intent, validates, learns.
Layer 3 (Crystallized Memory): Partially implemented. Agent accumulates experience from daily use, hardens successful paths.
Layer 4 (Route Optimization): Partially implemented. Automatically discover and upgrade to faster execution routes.
Layer 5 (Proactive Autonomy): Still the long-term direction. Notice and act in its own workspace without disrupting the user.

Technical Capabilities

Understudy is a unified desktop runtime that mixes every execution route in one agent loop, one session, one policy pipeline:

GUI: 13 tools + screenshot grounding + native input for any macOS desktop app
Browser: Playwright managed + Chrome extension relay for any website with login sessions
Shell: bash tool with full local access for CLI tools, scripts, file system
Web: web_search + web_fetch for real-time information retrieval
Memory: Semantic memory across sessions for persistent context and preferences
Messaging: 8 channel support

How It Works in Practice

In the demo video, the creator teaches Understudy to: Google Image search → download a photo → remove background in Pixelmator Pro → export → send via Telegram. Then asks it to do the same for Elon Musk. The replay isn't a brittle macro — the published skill stores intent steps, route options, and GUI hints only as a fallback. It can prefer faster routes when available instead of repeating every GUI step.

Installation and Setup

Current platform: macOS only. Installation is via npm:

npm install -g @understudy-ai/understudy
understudy wizard

The published skill artifact from the showcase demo is available at examples/published-skills/taught-person-photo-cutout-bc88ec/SKILL.md for inspection.

Who It's For

Developers who work across multiple desktop applications and want to automate repetitive tasks without building custom integrations or workflow builders.

📖 Read the full source: HN AI Agents