Understudy: A Teachable Desktop Agent That Learns Tasks by Demonstration

What Understudy Does
Understudy is a teachable desktop agent that operates your computer like a human colleague — handling GUI, browser, shell, file system, and messaging tools in one local runtime. The core innovation is teach-by-demonstration: you perform a task once, the agent records screen video plus semantic events, extracts the intent (not just coordinates), and turns it into a reusable skill.
Current Implementation Status
The system is designed as five layers, with current implementation status:
- Layer 1 (Operate Software Natively): Implemented today on macOS. Operates any macOS desktop app using 13 tools + screenshot grounding + native input.
- Layer 2 (Learn from Demonstrations): Implemented and usable today. User shows a task once — agent extracts intent, validates, learns.
- Layer 3 (Crystallized Memory): Partially implemented. Agent accumulates experience from daily use, hardens successful paths.
- Layer 4 (Route Optimization): Partially implemented. Automatically discover and upgrade to faster execution routes.
- Layer 5 (Proactive Autonomy): Still the long-term direction. Notice and act in its own workspace without disrupting the user.
Technical Capabilities
Understudy is a unified desktop runtime that mixes every execution route in one agent loop, one session, one policy pipeline:
- GUI: 13 tools + screenshot grounding + native input for any macOS desktop app
- Browser: Playwright managed + Chrome extension relay for any website with login sessions
- Shell: bash tool with full local access for CLI tools, scripts, file system
- Web: web_search + web_fetch for real-time information retrieval
- Memory: Semantic memory across sessions for persistent context and preferences
- Messaging: 8 channel support
How It Works in Practice
In the demo video, the creator teaches Understudy to: Google Image search → download a photo → remove background in Pixelmator Pro → export → send via Telegram. Then asks it to do the same for Elon Musk. The replay isn't a brittle macro — the published skill stores intent steps, route options, and GUI hints only as a fallback. It can prefer faster routes when available instead of repeating every GUI step.
Installation and Setup
Current platform: macOS only. Installation is via npm:
npm install -g @understudy-ai/understudy
understudy wizard
The published skill artifact from the showcase demo is available at examples/published-skills/taught-person-photo-cutout-bc88ec/SKILL.md for inspection.
Who It's For
Developers who work across multiple desktop applications and want to automate repetitive tasks without building custom integrations or workflow builders.
📖 Read the full source: HN AI Agents
👀 See Also

WeAreHere Browser Extension and MCP Tools Scan Website Privacy Practices
Two open-source tools—barebrowse and wearehere—scan websites for trackers, fingerprinting, and data broker connections. The wearehere browser extension shows real-time privacy scores (0-100) as you browse, while MCP servers enable AI assistants to assess any site on command.

OmniRecall Beta: FAISS-Powered Memory Injection for Cloud LLM Chats
OmniRecall is a local mitmproxy bypass that intercepts traffic to cloud chat interfaces like DeepSeek, adding a permanent memory layer using FAISS indexing and sentence-transformers MiniLM-L6. It's currently in beta, requires CPU-only operation, and uses an aggressively restrictive source-available license.

MatchKit: Design System Generator for Claude Code Projects
MatchKit is a tool that generates complete branded design systems for projects built with Claude Code. It extracts brand colors from uploaded logos and generates customizable components, layouts, and design tokens to avoid the generic look common with AI coding tools.

Claude Code protocol file reduces repetitive questioning
A developer created a single .md file for ~/.claude/rules/ that infers task type and risk from the first message, eliminating Claude Code's typical three-question sequence before starting work.