ClawCut Proxy: Optimize OpenClaw for Small LLMs on GitHub

ClawCut Proxy is now available on GitHub as an experimental tool designed to optimize OpenClaw's interaction with local LLMs, particularly smaller models that struggle with OpenClaw's default large system prompts and complex tool definitions.

What ClawCut Solves

OpenClaw sends massive system prompts (often >28,000 characters) and complex JSON tool definitions to LLMs. While large cloud models or high-end local models (14B+) handle this well, small models (7B, 8B) running on limited hardware (Mac/MLX or Raspberry Pi) suffer from "Cognitive Overload," leading to:

Extreme processing latency (slow Time To First Token)
Models forgetting their identity or available tools
Hallucinating text answers instead of executing local scripts
Connection timeouts or malformed JSON responses
Huge RAM consumption

How ClawCut Works

ClawCut acts as a "Man-in-the-Middle" between OpenClaw and your local LLM server with these optimization features:

PROMPT TRIMMING: Automatically removes unused default skills from the system prompt to keep the context window small and focused
SMART AMNESIA: Intelligently truncates chat history after successful tool executions to free up "mental space" for the model
ATTENTION FORCER: Injects a reminder at the very end of the user query to ensure the model prioritizes tool usage
TOOL FORCER: Injects keywords for tool calling and points to commands
INPUT RESCUE: Short-circuits known incoming requests (like Cron-Jobs) to bypass LLM latency and ensure 100% reliability for automated tasks
BASH-RESCUE: Detects poorly formatted script calls (e.g., naked code blocks) and converts them into valid OpenClaw tool calls on the fly
Automatically filters dynamic timestamps from system prompts to enable near-instant responses via hardware caching
Translates between OpenAI-compatible streams (MLX) and the Ollama/NDJSON format expected by OpenClaw
Real-time console output of prefill duration, token count

Performance and Debugging

ClawCut provides significantly faster response times (TTFT) as the model has less text to process upfront, improved reliability when calling scripts, and robust error handling for stream interruptions or formatting errors. With DEBUG_MODE enabled, you can inspect the full "JSON Clutter" sent by OpenClaw to understand exactly what the model is processing.

When to Use

Ideal for small models (7B-8B) running on hardware like Mac (MLX), Windows, or Linux, especially if your model "chats" too much instead of executing commands. Use with caution if you're using highly intelligent, large models (14B+) that can handle complex prompts natively. In this case, the proxy can act purely as a logger and format translator without manipulating content if PASS_THROUGH_MODE = True.

📖 Read the full source: r/openclaw