Qwen Meetup Draft: Function Calling Harness 2 Boosts CoT Compliance from 9.91% to 100% via Structured Schemas

A talk at Qwen Meetup Korea (end of May) presents a second iteration of the function-calling harness pattern. The original harness pushed qwen3-coder-next from 6.75% to 100% on backend codegen using type validation and compiler feedback. This update extends the same idea to domains that lack a compiler: investment memos, legal opinions, and clinical charts.
Schema-Driven CoT Compliance
The core mechanism is a TypeScript schema (using typia tags) that forces the model's reasoning into a required form. Every field must be filled or the submission is rejected. Example schema for an investment memo:
import { tags } from "typia";
export interface IInvestmentMemo {
recommendation: "BUY" | "HOLD" | "SELL";
thesis: {
consensusView: string;
differentiatedView: string;
};
counterThesis: {
bearCase: string;
ourResponse: string;
};
// bull / base / bear all required — blocks submitting just the base case
scenarios: {
bull: IScenario;
base: IScenario;
bear: IScenario;
};
// empty arrays are sealed
valuationDrivers: IValuationDriver[] & tags.MinItems<1>;
killConditions: IKillCondition[] & tags.MinItems<1>;
evidenceSources: IEvidenceSource[] & tags.MinItems<1>;
}
// Falsifiable thresholds only — blocks free-form like "trust in management"
export type IKillCondition =
| { type: "price_drawdown"; percentBelowEntry: number }
| { type: "metric_breach"; metric: string; below: number }
| { type: "milestone_miss"; expectedBy: string; what: string };
The schema is then validated by running it on historical investment cases — same idea as backtesting a trading strategy on market data. The diff shows which past calls the schema would have gotten right and which it missed; you add what's missing.
Measured CoT Compliance
Using AutoBE's CoT feature (not financial investment analysis itself), qwen3.6-27b keeps up with frontier models on these CoT-compliance schemas. The harness brings compliance from 9.91% to 100%.
Who It's For
Developers building AI agents that need structured, verifiable reasoning in domains without automatic correctness checks (e.g., finance, legal, medical).
📖 Read the full source: r/LocalLLaMA
Previous presentation: Part 1
👀 See Also

Holaboss AI Runtime Moves to TypeScript, Implements Persistent MCP Ports
The Holaboss AI local agent runtime has been refactored to use TypeScript exclusively, eliminating Python dependencies and reducing bundle size. It now persists MCP server ports in SQLite with UNIQUE(port) constraints to prevent collisions across restarts.

Agent-Desktop: Structured Desktop Automation via OS Accessibility Trees
Agent-desktop is a cross-platform CLI (Rust binary, ~15 MB) that exposes 53 commands with JSON output for inspecting and operating native apps through OS accessibility APIs — no screenshots or vision models needed. It uses progressive skeleton traversal to reduce token usage by 78-96% on dense apps like Slack or VS Code.

Knowledge Raven: A Searchable Knowledge Base Plugin for Claude
Knowledge Raven is a tool that lets Claude search your documents from sources like Confluence, Notion, Google Drive, Dropbox, and GitHub via a Claude Desktop plugin or MCP server, providing semantic search, keyword search, and full document retrieval.

ClearSpec: A Spec Generator to Reduce Hallucination in Claude Code
ClearSpec is a tool that generates structured specifications from plain English descriptions, connecting to GitHub repos to reference real file paths and dependencies, then uses those specs as prompts for Claude Code to provide better context.