Function Calling Harness 2: Boosts CoT Compliance to 100%

A talk at Qwen Meetup Korea (end of May) presents a second iteration of the function-calling harness pattern. The original harness pushed qwen3-coder-next from 6.75% to 100% on backend codegen using type validation and compiler feedback. This update extends the same idea to domains that lack a compiler: investment memos, legal opinions, and clinical charts.

Schema-Driven CoT Compliance

The core mechanism is a TypeScript schema (using typia tags) that forces the model's reasoning into a required form. Every field must be filled or the submission is rejected. Example schema for an investment memo:

import { tags } from "typia";

export interface IInvestmentMemo {
  recommendation: "BUY" | "HOLD" | "SELL";
  thesis: {
    consensusView: string;
    differentiatedView: string;
  };
  counterThesis: {
    bearCase: string;
    ourResponse: string;
  };
  // bull / base / bear all required — blocks submitting just the base case
  scenarios: {
    bull: IScenario;
    base: IScenario;
    bear: IScenario;
  };
  // empty arrays are sealed
  valuationDrivers: IValuationDriver[] & tags.MinItems<1>;
  killConditions: IKillCondition[] & tags.MinItems<1>;
  evidenceSources: IEvidenceSource[] & tags.MinItems<1>;
}
// Falsifiable thresholds only — blocks free-form like "trust in management"
export type IKillCondition =
  | { type: "price_drawdown"; percentBelowEntry: number }
  | { type: "metric_breach"; metric: string; below: number }
  | { type: "milestone_miss"; expectedBy: string; what: string };

The schema is then validated by running it on historical investment cases — same idea as backtesting a trading strategy on market data. The diff shows which past calls the schema would have gotten right and which it missed; you add what's missing.

Measured CoT Compliance

Using AutoBE's CoT feature (not financial investment analysis itself), qwen3.6-27b keeps up with frontier models on these CoT-compliance schemas. The harness brings compliance from 9.91% to 100%.

Who It's For

Developers building AI agents that need structured, verifiable reasoning in domains without automatic correctness checks (e.g., finance, legal, medical).

📖 Read the full source: r/LocalLLaMA

Previous presentation: Part 1