LM Studio parser bugs break Qwen3.5 tool calling and reasoning

✍️ OpenClawRadar📅 Published: March 2, 2026🔗 Source
LM Studio parser bugs break Qwen3.5 tool calling and reasoning
Ad

LM Studio parser issues affecting reasoning models

LM Studio's server parser contains multiple bugs that interfere with tool calling and reasoning in models like Qwen3.5 and DeepSeek-R1. These issues can cause models to appear broken when the problem is actually in the parser.

The bugs

1. Parser scans inside <think> blocks for tool call patterns

When reasoning models think about tool calling syntax inside their <think> blocks, LM Studio's parser treats those prose mentions as actual tool call attempts. This creates a recursive trap where the model reasons about tool calls, the parser finds tool-call-shaped tokens in the thinking, the parse fails, the error is fed back to the model, and the cycle repeats.

The model literally cannot debug a tool calling issue because describing the problem reproduces it. One model explicitly said "I'm getting caught in a loop where my thoughts about tool calling syntax are being interpreted as actual tool call markers" — and that sentence itself triggered the parser.

This was first reported as issue #453 in February 2025 and remains open over a year later.

Workaround: Disable reasoning with {%- set enable_thinking = false %}. This instantly fixes the issue, allowing 20+ consecutive tool calls to succeed.

2. Registering a second MCP server breaks tool call parsing for the first

This bug is clean and deterministic. Testing with lfm2-24b-a2b at temperature=0.0 shows:

  • Only KG server active: Model correctly calls search_nodes, parser recognizes <|tool_call_start|> tokens, tool executes, results returned. Works perfectly.
  • Add webfetch server (don't even call it): Model emits <|tool_call_start|>[web_search(...)]<|tool_call_end|> as raw text in the chat. The special tokens are no longer recognized. The tool is never executed.

The mere registration of a second MCP server — without calling it — changes how the parser handles the first server's tool calls. Same model, same prompt, same target server. Single variable changed.

Workaround: Only register the MCP server you need for each task. This is impractical for agentic workflows.

3. Server-side reasoning_content/content split produces empty responses that report success

This affects everyone using reasoning models via the API, whether using tool calling or not. When sending a simple prompt to Qwen3.5-35b-a3b via /v1/chat/completions asking it to list XML tags used for reasoning, the server returned:

{
  "content": "",
  "reasoning_content": "[3099 tokens of detailed deliberation]",
  "finish_reason": "stop"
}

The model did extensive work — 3099 tokens of reasoning — but got caught in a deliberation loop inside <think> and never produced output in the content field. The server returned finish_reason: "stop" with empty content, reporting success.

This means:

  • Every eval harness checking finish_reason == "stop" silently accepts empty responses
  • Every agentic framework propagates empty strings downstream
  • Every user sees a blank response and concludes the model is broken
  • The actual reasoning is trapped in reasoning_content — the model did real work that nobody sees unless they explicitly check that field

This is server-side, not a UI bug, confirmed by inspecting the raw API response and LM Studio server log. The reasoning_content/content split happens before the response reaches any client.

Ad

Bug interaction

These aren't independent issues. They interact to create systemic problems with tool calling and reasoning in LM Studio.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also