WCY format reduces LLM token overhead by 50-71% and adds structural 'I don't know' markers

✍️ OpenClawRadar📅 Published: March 17, 2026🔗 Source
WCY format reduces LLM token overhead by 50-71% and adds structural 'I don't know' markers
Ad

WCY (Watch → Compute → Yield) is a line-oriented format designed to reduce LLM token overhead and provide structural markers for uncertainty in reasoning. It replaces JSON's brackets, quotes, and commas with one-marker-per-line syntax.

Token reduction benchmarks

From testing across 10-500 rows and MCP exchange types:

  • Structured data vs JSON: -50 to -54% token reduction
  • Tool-call schemas: -65 to -71% reduction
  • Full MCP protocol exchange: -61% reduction
  • Multi-agent output tokens: -40% reduction

No fine-tuning is needed—three few-shot examples are enough for models to switch formats. The parse_r metric goes from 0.29 to 1.00 on complex tasks with this approach.

Ad

The ? marker for uncertainty

WCY introduces a structural way for LLMs to mark what they don't know during reasoning. The ? (void-B) slot allows models to indicate uncertainty inline:

: ?diagnosis hint=labs+imaging conf_range=0.4..0.8
    order CT_scan reason=from=3 . CT_result mass_in_RUL size=2.3cm : diagnosis=adenocarcinoma conf=0.82 from=3,5

Testing showed:

  • Zero-shot: models use ? markers 0% of the time, even with the spec in the prompt
  • With 3 examples: 5.4 markers per trace, 67-97% resolved
  • 48 pipeline traces across 8 domains: 95% resolution, 100% quality gate pass

The from= slot tracks which observations support which conclusions inline, which helps catch hallucination chains.

Available resources

  • wcy_parser.py — pure Python, no external dependencies
  • wcy_eval.py — 3-axis scoring (Structural / Meaning / Provenance)
  • 60 reasoning traces with void-B cycles (CC BY 4.0 license, for fine-tuning experiments)
  • Pipeline script to generate more traces

So far only tested on Claude Sonnet. The author is curious whether the 0% → 5.4 markers result holds on Qwen, Llama, and Mistral with the same few-shot examples.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

Four ClawHub Skills for Real-Time Search Data in AI Agents
Tools

Four ClawHub Skills for Real-Time Search Data in AI Agents

Four ClawHub skills provide structured search capabilities for AI agents: Google (web, news, images, maps), Amazon (product search across 12 marketplaces), Walmart (product search with delivery filters), and YouTube (video search with transcripts). Install via clawhub install commands with one API key.

OpenClawRadar
Code-Graph-MCP: Open Source MCP Server Reduces Claude Code Token Usage by 40-60%
Tools

Code-Graph-MCP: Open Source MCP Server Reduces Claude Code Token Usage by 40-60%

code-graph-mcp is an MCP server that indexes codebases into an AST knowledge graph, replacing multiple grep/read calls with single structured queries. The developer reports 40-60% total session token savings and 80% fewer tool calls per navigation task.

OpenClawRadar
Developer shares solution for Claude AI ignoring rules beyond 50-count threshold
Tools

Developer shares solution for Claude AI ignoring rules beyond 50-count threshold

A developer reports Claude Code started silently dropping rules once their shared rule set exceeded approximately 50 items, particularly during frontend-heavy tasks. They built a hook that scans prompts and loads only 2-3 relevant rules based on keyword matching.

OpenClawRadar
PeaDB: Redis-Compatible Database Coded with AI Assistants in C++20
Tools

PeaDB: Redis-Compatible Database Coded with AI Assistants in C++20

A developer created PeaDB, a Redis 7.2.5 drop-in replacement written in C++20 using Codex, Copilot, and Claude, implementing ~147 commands with persistence, replication, and cluster support. Benchmarks show performance close to Redis.

OpenClawRadar