Lemonade by AMD: Open Source Local LLM Server for GPU and NPU

✍️ OpenClawRadar📅 Published: April 5, 2026🔗 Source

What Lemonade Is

Lemonade is a local AI server built by AMD and the local AI community that runs text, image, and speech models on GPUs and NPUs. It's open source, designed to be private, and claims to be ready in minutes on any PC.

Key Features and Specifications

Native C++ Backend: Lightweight service that is only 2MB
One Minute Install: Simple installer that sets up the stack automatically
OpenAI API Compatible: Works with hundreds of apps out-of-box and integrates in minutes
Auto-configures for your hardware: Configures dependencies for your GPU and NPU
Multi-engine compatibility: Works with llama.cpp, Ryzen AI SW, FastFlowLM, and more
Multiple Models at Once: Run more than one model at the same time
Cross-platform: A consistent experience across Windows, Linux, and macOS (beta)
Built-in app: A GUI that lets you download, try, and switch models quickly
Unified API: One local service for every modality including chat, vision, image generation, transcription, and speech generation

Model Support and Performance

The server can load models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use. For tuning, you can use --no-mmap to speed up load times and increase context size to 64 or more. The source mentions that with 128 GB of unified RAM, you can load larger models.

Ecosystem Integration

Lemonade is integrated in many apps and works out-of-box with hundreds more thanks to the OpenAI API standard. Mentioned integrations include Open WebUI, n8n, Gaia Infinity, Arcade, GitHub Copilot, OpenHands, Dify, Deep Tutor, and Iterate.ai.

Community and Development

The project has 2.1k stars on GitHub and an active Discord community with 117 online at the time of the source. It's described as being built by the local AI community for every PC, with the philosophy that local AI should be free, open, fast, and private.

📖 Read the full source: HN LLM Tools

👀 See Also

Tools

CC-Ledger: Track Claude Code Costs Per Session and Per PR with Local SQLite

CC-Ledger is a Rust binary that hooks into Claude Code, logging each turn to local SQLite. Catch runaway sessions live and see per-PR cost without an API key. Includes macOS menu bar, web dashboard, and CLI views.

May 22, 2026, 12:17 AM UTC

OpenClawRadar

Tools

Mengram adds persistent memory to OpenClaw agents

Mengram is an open-source memory system that gives OpenClaw agents long-term memory across sessions, solving the problem of agents forgetting everything when they restart. It provides episodic, entity, and procedural memory with smart archival of outdated facts.

Mar 17, 2026, 06:45 AM UTC

OpenClawRadar

Tools

PocketBot: iOS app uses Claude to generate deterministic JavaScript automations from natural language

PocketBot is an iOS mobile automation app that uses Claude via AWS Bedrock to convert plain-language requests into self-contained JavaScript scripts. The LLM writes the code once, then the deterministic scripts run on schedule in a sandboxed runtime without AI involvement.

Apr 15, 2026, 06:12 PM UTC

OpenClawRadar

Tools

Open-Sourced Claude Code Skills: A /do Pipeline That Cut Follow-Ups by 80%

A developer open-sourced 15 Claude Code skills built over 100+ freelance projects. The /do command runs a 5-step pipeline (/todo → /dev → /verify-dev → /build → /test → push) with auto-fix loops, resulting in 80% fewer follow-ups and 60-65% better code quality across 2000+ commits.

May 17, 2026, 04:15 PM UTC

OpenClawRadar