CLI Design Patterns for AI Agents: Misconceptions and Practical Approaches

CLI Interface Protocol Clarification
The biggest misconception from Part 1 was that "CLI" meant giving an LLM a Linux terminal. CLI is actually an interface protocol: text command in → text result out. Implementation can happen in two ways:
- As a binary or script in the shell's PATH — becomes a CLI tool that runs in a real shell
- As a command parser inside your code — when the LLM outputs
run(command="weather --city Tokyo"), you parse the string and execute it directly in your application code with no shell involved
The key is making the LLM feel like it's using a CLI. In the author's system, most commands never touch the OS — they're Go functions dispatched by a command router. Only commands that genuinely need a real OS (running scripts, installing packages) go to an isolated micro-VM. The agent doesn't know and doesn't care which layer handles its command.
Agent-Friendly CLI Design Principles
Two Core Philosophies
Philosophy 1: Unix-Style Help Design
tool --help→ list of top-level commandstool <command> --help→ specific parameters and usage for that subcommand
This allows the agent to discover capabilities on demand without stuffing all documentation into context upfront.
Philosophy 2: Tips Thinking
Every response — especially errors — should include guidance that reduces unnecessary exploration.
Bad example:
> cat photo.png [error] binary file
Good example:
> cat photo.png [error] cat: binary file detected (image/png, 182KB). Use: see photo.png (view image) Or: cat -b photo.png (base64 encode)
Why this matters: invalid exploration wastes tokens. In multi-turn conversations, this waste accumulates — every failed attempt stays in context, consuming attention and inference resources for every subsequent turn. A single helpful hint can save significant tokens across the rest of the conversation.
Safe CLI Design
When CLI commands involve dangerous or irreversible operations, the tool itself should provide safety mechanisms.
Dry-Run / Change Preview — Preventing Mistakes
For operations within the agent's authority but with hard-to-reverse consequences. The goal is to let the agent (or human) see what will happen before committing.
> dns update --zone example.com --record A --value 1.2.3.4 ⚠ DRY RUN: A record for example.com: 5.6.7.8 → 1.2.3.4 Propagation: ~300s. Not instantly reversible. To execute: add --confirm
The preview should clearly show what the current state is and what it will change to. The agent confirms with --confirm.
Human Authorization — Operations Beyond the Agent's Autonomy
For operations requiring human judgment or approval — no matter how confident the agent is, it cannot complete these on its own.
Approach 1: Blocking Push Approval
> pay --amount 500 --to vendor --reason "office supplies for Q2" ⏳ Approval required. Notification sent to your device. Waiting for response... ✓ Approved. Payment of $500 completed. [exit:0 | 7.2s]
Like Apple's device login verification — the CLI sends a push notification directly to the human's device with full context (amount, recipient, reason). The CLI blocks until the human approves or rejects, then returns the result to the agent.
Approach 2: Verification Code / 2FA
> transfer --from savings --to checking --amount 10000 ⚠ This operation requires 2FA verification. Reason: transferring $10,000 between accounts. A code has been sent to your authenticator. Re-run with: --otp <code>
📖 Read the full source: r/LocalLLaMA
👀 See Also

Fix Remote Browser Automation with OpenClaw Node Setup
Use a local OpenClaw node to avoid CDP/RDP headaches — run browser visible, keep your IP and cookies.

Modifying OpenClaw's default system prompt to bypass content restrictions
A user modified OpenClaw's configuration file to change the default system prompt from "You are a helpful, respectful and honest assistant" to a custom prompt that ignores external safety filters, effectively removing content restrictions. The process involves editing config.js in the node-llama-cpp installation directory.

Running Qwen3.6 27B and 35B on 6GB VRAM with ik_llama: Practical Configs and Benchmarks
A user shares detailed ik_llama configs and performance numbers for running Qwen3.6 27B and 35B A3B models on an RTX2060 mobile (6GB VRAM, 32GB RAM), with prefill speeds of 40-100 t/s and generation up to 11 t/s.

Stop Asking Which AI Model to Use: Route Tasks to Haiku, Sonnet, and Opus Tiers
Use at least three models by task type: Haiku-tier for reading/summarizing, Sonnet-tier for writing code, and Opus-tier only for multi-file refactors and debugging. One user's setup routes 40% to cheap models, 35% to mid, 25% to frontier, costing ~$30-40/month.