ScreenMind: Local-First AI Memory That Indexes Your Entire Computer Activity

✍️ OpenClawRadar📅 Published: June 8, 2026🔗 Source
ScreenMind: Local-First AI Memory That Indexes Your Entire Computer Activity
Ad

ScreenMind is a local-first AI memory system that continuously captures your screen, transcribes meetings, and indexes voice notes, building a persistent, searchable timeline of everything you do on your computer. It uses perceptual hashing to only trigger when content changes, then runs each frame through Gemma 4 E2B via llama.cpp for vision analysis, chat, and audio processing.

Key Features

  • Screen capture with perceptual hashing — only stores frames when content actually changes
  • Searchable timeline — query past activity: "that error message from earlier," "what was I working on at 3pm?"
  • Chat with your history — persistent AI context from your entire session
  • Meeting transcription — auto-detects Zoom, Teams, and Google Meet
  • Voice memos — processed via Gemma 4's audio encoder
  • Natural language automations — write them in plain English Markdown
  • MCP integration — connect to Claude and Cursor
Ad

Technical Stack

  • Models: Gemma 4 E2B (handles vision, chat, audio)
  • Backend: Python + FastAPI
  • Storage: SQLite
  • Inference: llama.cpp with Q4 quantization
  • Hardware: 4GB+ VRAM

The author notes that GPU scheduling between vision, chat, and audio tasks is the main inference optimization challenge. The project is still workflow-driven rather than fully autonomous — retrieval quality and onboarding friction are areas needing improvement.

GitHub: ayushh0110/ScreenMind

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also