3 LM Studio Parser Bugs Break Qwen3.5 Tool Calling & Reasoning

LM Studio parser issues affecting reasoning models

LM Studio's server parser contains multiple bugs that interfere with tool calling and reasoning in models like Qwen3.5 and DeepSeek-R1. These issues can cause models to appear broken when the problem is actually in the parser.

The bugs

1. Parser scans inside <think> blocks for tool call patterns

When reasoning models think about tool calling syntax inside their <think> blocks, LM Studio's parser treats those prose mentions as actual tool call attempts. This creates a recursive trap where the model reasons about tool calls, the parser finds tool-call-shaped tokens in the thinking, the parse fails, the error is fed back to the model, and the cycle repeats.

The model literally cannot debug a tool calling issue because describing the problem reproduces it. One model explicitly said "I'm getting caught in a loop where my thoughts about tool calling syntax are being interpreted as actual tool call markers" — and that sentence itself triggered the parser.

This was first reported as issue #453 in February 2025 and remains open over a year later.

Workaround: Disable reasoning with {%- set enable_thinking = false %}. This instantly fixes the issue, allowing 20+ consecutive tool calls to succeed.

2. Registering a second MCP server breaks tool call parsing for the first

This bug is clean and deterministic. Testing with lfm2-24b-a2b at temperature=0.0 shows:

Only KG server active: Model correctly calls search_nodes, parser recognizes <|tool_call_start|> tokens, tool executes, results returned. Works perfectly.
Add webfetch server (don't even call it): Model emits <|tool_call_start|>[web_search(...)]<|tool_call_end|> as raw text in the chat. The special tokens are no longer recognized. The tool is never executed.

The mere registration of a second MCP server — without calling it — changes how the parser handles the first server's tool calls. Same model, same prompt, same target server. Single variable changed.

Workaround: Only register the MCP server you need for each task. This is impractical for agentic workflows.

3. Server-side reasoning_content/content split produces empty responses that report success

This affects everyone using reasoning models via the API, whether using tool calling or not. When sending a simple prompt to Qwen3.5-35b-a3b via /v1/chat/completions asking it to list XML tags used for reasoning, the server returned:

{
  "content": "",
  "reasoning_content": "[3099 tokens of detailed deliberation]",
  "finish_reason": "stop"
}

The model did extensive work — 3099 tokens of reasoning — but got caught in a deliberation loop inside <think> and never produced output in the content field. The server returned finish_reason: "stop" with empty content, reporting success.

This means:

Every eval harness checking finish_reason == "stop" silently accepts empty responses
Every agentic framework propagates empty strings downstream
Every user sees a blank response and concludes the model is broken
The actual reasoning is trapped in reasoning_content — the model did real work that nobody sees unless they explicitly check that field

This is server-side, not a UI bug, confirmed by inspecting the raw API response and LM Studio server log. The reasoning_content/content split happens before the response reaches any client.