Running NemoClaw with Local vLLM: Setup Notes and Agent Engineering Observations

✍️ OpenClawRadar📅 Published: March 20, 2026🔗 Source

Local NemoClaw Setup with vLLM

A developer shared their experience running NVIDIA's NemoClaw, a sandboxed AI agent platform, with a local Nemotron 9B v2 model using vLLM on WSL2. The setup is based on jieunl24's fork of NemoClaw.

Key Technical Details

Inference Routing: NemoClaw's inference routing follows a clean path: inference.local → gateway → vLLM. However, initial onboarding bugs required a 3-layer network hack that has since been fixed via PR #412.

Parser Compatibility: The built-in vLLM parsers (qwen3_coder, nemotron_v3) are incompatible with Nemotron v2 models. You need NVIDIA's official plugin parsers from the NeMo repository instead.

Agent Engineering Gap: OpenClaw as an agent platform provides solid infrastructure but ships with minimal prompt engineering. The gap between "model serves text" and "agent does useful work" is primarily about scaffolding rather than model capability limitations.

Resources

Blog post covering architecture, vLLM parser setup, and agent engineering observations: https://github.com/soy-tuber/nemoclaw-local-inference-guide/blob/master/BLOG-openclaw-agent-engineering.md
Setup guide (V2) with inference.local routing and no network hacks: https://github.com/soy-tuber/nemoclaw-local-inference-guide
Original NemoClaw issue #315: https://github.com/NVIDIA/NemoClaw/issues/315

This setup demonstrates practical local deployment of AI agent platforms, highlighting both the technical implementation details and the ongoing challenges in agent engineering.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Claude adds interactive chart and diagram creation feature

Claude can now generate interactive visuals including charts, diagrams, and explorable breakdowns directly within conversations. The feature is available in beta across all plans including free tier.

Mar 12, 2026, 07:45 PM UTC

OpenClawRadar

Tools

CRMy: Open Source CRM and Customer Context Engine for OpenClaw

CRMy is an open source CRM and Customer Context Engine built specifically for OpenClaw agents. It includes a complete CLI, OpenClaw plugin with 12 CRM tools, PostgreSQL backend, and self-hosted deployment with two commands.

Mar 20, 2026, 04:45 PM UTC

OpenClawRadar

Tools

Claude Code at Scale: How Agentic Search Avoids RAG Failure Modes in Large Codebases

Claude Code uses agentic file-system traversal instead of embedding-based RAG, eliminating stale index issues. The article details five extension points (CLAUDE.md, hooks, skills, plugins, MCP) and the harness-as-model philosophy for multi-million-line repos.

May 15, 2026, 08:16 AM UTC

OpenClawRadar

Tools

Claude Pulse Browser Extension Surfaces Token Counts, Cache Timers, and Rate Limits on Claude.ai

Claude Pulse is a client-side Chrome extension that adds a real-time dashboard to Claude.ai showing per-message token counts, total context usage, prompt cache expiry timer, and rate limit progress bar. Also includes chat export to Markdown.

May 1, 2026, 04:15 PM UTC

OpenClawRadar