How to Integrate Local LLM Agents with ComfyUI for Batch Image Gen

A developer on r/LocalLLaMA shared their integration between a local OpenClaw agent and ComfyUI that enables natural language batch image generation. The setup allows users to describe image requests in plain English, with the agent handling the entire ComfyUI pipeline without manual UI interaction.

How the Integration Works

The flow follows this sequence:

Agent receives image request
Parses intent into structured inputs (prompt, dimensions, steps, seed)
Calls comfyui skill as a tool
Skill builds a ComfyUI workflow JSON from inputs
POSTs to local ComfyUI HTTP API (/prompt)
Polls /history every 2 seconds until render completes
Retrieves output path from /view
Returns result to agent
Agent confirms with user

Technical Implementation Details

The integration uses ComfyUI's node-ID-based JSON workflow format. The skill maps agent inputs onto specific node IDs in a base workflow template (KSampler, CLIPTextEncode, etc.). This is described as "the most fragile part of the integration since it depends on your workflow's node structure, but for standard setups it works reliably."

The skill includes startup verification by pinging /object_info to ensure ComfyUI is actually ready (not just reachable) before accepting jobs. This prevents jobs from queuing without running when checkpoints are still loading.

Error Handling Improvements

Every API call is wrapped to return agent-readable errors instead of raw HTTP failures. For example, "Connection refused at 127.0.0.1:8188" becomes "ComfyUI doesn't seem to be running. Start it with --listen and try again." This makes debugging easier, especially when working remotely.

Current Limitations

The integration doesn't yet support:

Advanced multi-node workflows (ControlNet, LoRA stacking)
Real-time progress streaming via WebSocket
Cross-platform testing beyond Windows

The entire stack runs locally using OpenClaw (self-hosted agent framework) + ComfyUI + a Node.js skill script, with no cloud components.

📖 Read the full source: r/LocalLLaMA

Integrating Local LLM Agents with ComfyUI for Natural Language Batch Image Generation

How the Integration Works

Technical Implementation Details

Error Handling Improvements

Current Limitations

👀 See Also

AI Token Monitor: macOS Tool Tracks Local Claude Usage and Cost

WinRemote MCP: Open Source MCP Server for Full Control of Windows Desktops

Caveman: A Claude Code Skill That Cuts 75% of Tokens by Using Caveman-Style Speech

ReRouted: macOS Menu Bar App to Auto-Fallback Across Claude, Codex, Grok, and More