Integrating Local LLM Agents with ComfyUI for Natural Language Batch Image Generation

✍️ OpenClawRadar📅 Published: April 2, 2026🔗 Source
Integrating Local LLM Agents with ComfyUI for Natural Language Batch Image Generation
Ad

A developer on r/LocalLLaMA shared their integration between a local OpenClaw agent and ComfyUI that enables natural language batch image generation. The setup allows users to describe image requests in plain English, with the agent handling the entire ComfyUI pipeline without manual UI interaction.

How the Integration Works

The flow follows this sequence:

  • Agent receives image request
  • Parses intent into structured inputs (prompt, dimensions, steps, seed)
  • Calls comfyui skill as a tool
  • Skill builds a ComfyUI workflow JSON from inputs
  • POSTs to local ComfyUI HTTP API (/prompt)
  • Polls /history every 2 seconds until render completes
  • Retrieves output path from /view
  • Returns result to agent
  • Agent confirms with user

Technical Implementation Details

The integration uses ComfyUI's node-ID-based JSON workflow format. The skill maps agent inputs onto specific node IDs in a base workflow template (KSampler, CLIPTextEncode, etc.). This is described as "the most fragile part of the integration since it depends on your workflow's node structure, but for standard setups it works reliably."

The skill includes startup verification by pinging /object_info to ensure ComfyUI is actually ready (not just reachable) before accepting jobs. This prevents jobs from queuing without running when checkpoints are still loading.

Ad

Error Handling Improvements

Every API call is wrapped to return agent-readable errors instead of raw HTTP failures. For example, "Connection refused at 127.0.0.1:8188" becomes "ComfyUI doesn't seem to be running. Start it with --listen and try again." This makes debugging easier, especially when working remotely.

Current Limitations

The integration doesn't yet support:

  • Advanced multi-node workflows (ControlNet, LoRA stacking)
  • Real-time progress streaming via WebSocket
  • Cross-platform testing beyond Windows

The entire stack runs locally using OpenClaw (self-hosted agent framework) + ComfyUI + a Node.js skill script, with no cloud components.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also