Integrating Local LLM Agents with ComfyUI for Natural Language Batch Image Generation

A developer on r/LocalLLaMA shared their integration between a local OpenClaw agent and ComfyUI that enables natural language batch image generation. The setup allows users to describe image requests in plain English, with the agent handling the entire ComfyUI pipeline without manual UI interaction.
How the Integration Works
The flow follows this sequence:
- Agent receives image request
- Parses intent into structured inputs (prompt, dimensions, steps, seed)
- Calls comfyui skill as a tool
- Skill builds a ComfyUI workflow JSON from inputs
- POSTs to local ComfyUI HTTP API (/prompt)
- Polls /history every 2 seconds until render completes
- Retrieves output path from /view
- Returns result to agent
- Agent confirms with user
Technical Implementation Details
The integration uses ComfyUI's node-ID-based JSON workflow format. The skill maps agent inputs onto specific node IDs in a base workflow template (KSampler, CLIPTextEncode, etc.). This is described as "the most fragile part of the integration since it depends on your workflow's node structure, but for standard setups it works reliably."
The skill includes startup verification by pinging /object_info to ensure ComfyUI is actually ready (not just reachable) before accepting jobs. This prevents jobs from queuing without running when checkpoints are still loading.
Error Handling Improvements
Every API call is wrapped to return agent-readable errors instead of raw HTTP failures. For example, "Connection refused at 127.0.0.1:8188" becomes "ComfyUI doesn't seem to be running. Start it with --listen and try again." This makes debugging easier, especially when working remotely.
Current Limitations
The integration doesn't yet support:
- Advanced multi-node workflows (ControlNet, LoRA stacking)
- Real-time progress streaming via WebSocket
- Cross-platform testing beyond Windows
The entire stack runs locally using OpenClaw (self-hosted agent framework) + ComfyUI + a Node.js skill script, with no cloud components.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw Context Meter Plugin Shows Telegram Token Usage Percentage
A new OpenClaw plugin displays token usage percentage after every Telegram bot response, showing values like '45k / 200k (22%)' and detecting compaction events. The plugin avoids OOM issues by hardcoding context windows instead of using execSync.

Compass Chrome Extension Adds Navigation Tools to Claude and ChatGPT
A developer built a free Chrome extension called Compass that adds a prompt minimap, sticky scroll headers, session checklists, and prompt builder templates to Claude and ChatGPT interfaces to solve navigation problems in long conversations.

Open Source Claude Code Skills for Personalized Social Media Content
A developer has open sourced 13 Claude Code skills that help Claude write social media content in your own voice. The skills include context definition, strategy, creation, and analysis tools for LinkedIn, Twitter/X, Threads, and Bluesky.

certctl: Self-hosted certificate lifecycle platform with 78 API endpoints for AI agent automation
certctl is a self-hosted certificate lifecycle platform built with Go and TypeScript that exposes 78 REST API endpoints for certificate management. The platform is issuer-agnostic and target-agnostic, with an MCP server planned to expose all functionality as native MCP tools.