Running NemoClaw with Local vLLM: Setup Notes and Agent Engineering Observations

✍️ OpenClawRadar📅 Published: March 20, 2026🔗 Source
Running NemoClaw with Local vLLM: Setup Notes and Agent Engineering Observations
Ad

Local NemoClaw Setup with vLLM

A developer shared their experience running NVIDIA's NemoClaw, a sandboxed AI agent platform, with a local Nemotron 9B v2 model using vLLM on WSL2. The setup is based on jieunl24's fork of NemoClaw.

Key Technical Details

Inference Routing: NemoClaw's inference routing follows a clean path: inference.local → gateway → vLLM. However, initial onboarding bugs required a 3-layer network hack that has since been fixed via PR #412.

Parser Compatibility: The built-in vLLM parsers (qwen3_coder, nemotron_v3) are incompatible with Nemotron v2 models. You need NVIDIA's official plugin parsers from the NeMo repository instead.

Agent Engineering Gap: OpenClaw as an agent platform provides solid infrastructure but ships with minimal prompt engineering. The gap between "model serves text" and "agent does useful work" is primarily about scaffolding rather than model capability limitations.

Ad

Resources

This setup demonstrates practical local deployment of AI agent platforms, highlighting both the technical implementation details and the ongoing challenges in agent engineering.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also