Qwen Meetup Draft: Function Calling Harness 2 Boosts CoT Compliance from 9.91% to 100% via Structured Schemas

✍️ OpenClawRadar📅 Published: May 2, 2026🔗 Source
Qwen Meetup Draft: Function Calling Harness 2 Boosts CoT Compliance from 9.91% to 100% via Structured Schemas
Ad

A talk at Qwen Meetup Korea (end of May) presents a second iteration of the function-calling harness pattern. The original harness pushed qwen3-coder-next from 6.75% to 100% on backend codegen using type validation and compiler feedback. This update extends the same idea to domains that lack a compiler: investment memos, legal opinions, and clinical charts.

Schema-Driven CoT Compliance

The core mechanism is a TypeScript schema (using typia tags) that forces the model's reasoning into a required form. Every field must be filled or the submission is rejected. Example schema for an investment memo:

import { tags } from "typia";

export interface IInvestmentMemo { recommendation: "BUY" | "HOLD" | "SELL"; thesis: { consensusView: string; differentiatedView: string; }; counterThesis: { bearCase: string; ourResponse: string; }; // bull / base / bear all required — blocks submitting just the base case scenarios: { bull: IScenario; base: IScenario; bear: IScenario; }; // empty arrays are sealed valuationDrivers: IValuationDriver[] & tags.MinItems<1>; killConditions: IKillCondition[] & tags.MinItems<1>; evidenceSources: IEvidenceSource[] & tags.MinItems<1>; }

// Falsifiable thresholds only — blocks free-form like "trust in management" export type IKillCondition = | { type: "price_drawdown"; percentBelowEntry: number } | { type: "metric_breach"; metric: string; below: number } | { type: "milestone_miss"; expectedBy: string; what: string };

The schema is then validated by running it on historical investment cases — same idea as backtesting a trading strategy on market data. The diff shows which past calls the schema would have gotten right and which it missed; you add what's missing.

Ad

Measured CoT Compliance

Using AutoBE's CoT feature (not financial investment analysis itself), qwen3.6-27b keeps up with frontier models on these CoT-compliance schemas. The harness brings compliance from 9.91% to 100%.

Who It's For

Developers building AI agents that need structured, verifiable reasoning in domains without automatic correctness checks (e.g., finance, legal, medical).

📖 Read the full source: r/LocalLLaMA

Previous presentation: Part 1

Ad

👀 See Also

Holaboss AI Runtime Moves to TypeScript, Implements Persistent MCP Ports
Tools

Holaboss AI Runtime Moves to TypeScript, Implements Persistent MCP Ports

The Holaboss AI local agent runtime has been refactored to use TypeScript exclusively, eliminating Python dependencies and reducing bundle size. It now persists MCP server ports in SQLite with UNIQUE(port) constraints to prevent collisions across restarts.

OpenClawRadar
Agent-Desktop: Structured Desktop Automation via OS Accessibility Trees
Tools

Agent-Desktop: Structured Desktop Automation via OS Accessibility Trees

Agent-desktop is a cross-platform CLI (Rust binary, ~15 MB) that exposes 53 commands with JSON output for inspecting and operating native apps through OS accessibility APIs — no screenshots or vision models needed. It uses progressive skeleton traversal to reduce token usage by 78-96% on dense apps like Slack or VS Code.

OpenClawRadar
Knowledge Raven: A Searchable Knowledge Base Plugin for Claude
Tools

Knowledge Raven: A Searchable Knowledge Base Plugin for Claude

Knowledge Raven is a tool that lets Claude search your documents from sources like Confluence, Notion, Google Drive, Dropbox, and GitHub via a Claude Desktop plugin or MCP server, providing semantic search, keyword search, and full document retrieval.

OpenClawRadar
ClearSpec: A Spec Generator to Reduce Hallucination in Claude Code
Tools

ClearSpec: A Spec Generator to Reduce Hallucination in Claude Code

ClearSpec is a tool that generates structured specifications from plain English descriptions, connecting to GitHub repos to reference real file paths and dependencies, then uses those specs as prompts for Claude Code to provide better context.

OpenClawRadar