LM Studio parser bugs break Qwen3.5 tool calling and reasoning

LM Studio parser issues affecting reasoning models
LM Studio's server parser contains multiple bugs that interfere with tool calling and reasoning in models like Qwen3.5 and DeepSeek-R1. These issues can cause models to appear broken when the problem is actually in the parser.
The bugs
1. Parser scans inside <think> blocks for tool call patterns
When reasoning models think about tool calling syntax inside their <think> blocks, LM Studio's parser treats those prose mentions as actual tool call attempts. This creates a recursive trap where the model reasons about tool calls, the parser finds tool-call-shaped tokens in the thinking, the parse fails, the error is fed back to the model, and the cycle repeats.
The model literally cannot debug a tool calling issue because describing the problem reproduces it. One model explicitly said "I'm getting caught in a loop where my thoughts about tool calling syntax are being interpreted as actual tool call markers" — and that sentence itself triggered the parser.
This was first reported as issue #453 in February 2025 and remains open over a year later.
Workaround: Disable reasoning with {%- set enable_thinking = false %}. This instantly fixes the issue, allowing 20+ consecutive tool calls to succeed.
2. Registering a second MCP server breaks tool call parsing for the first
This bug is clean and deterministic. Testing with lfm2-24b-a2b at temperature=0.0 shows:
- Only KG server active: Model correctly calls
search_nodes, parser recognizes<|tool_call_start|>tokens, tool executes, results returned. Works perfectly. - Add webfetch server (don't even call it): Model emits
<|tool_call_start|>[web_search(...)]<|tool_call_end|>as raw text in the chat. The special tokens are no longer recognized. The tool is never executed.
The mere registration of a second MCP server — without calling it — changes how the parser handles the first server's tool calls. Same model, same prompt, same target server. Single variable changed.
Workaround: Only register the MCP server you need for each task. This is impractical for agentic workflows.
3. Server-side reasoning_content/content split produces empty responses that report success
This affects everyone using reasoning models via the API, whether using tool calling or not. When sending a simple prompt to Qwen3.5-35b-a3b via /v1/chat/completions asking it to list XML tags used for reasoning, the server returned:
{
"content": "",
"reasoning_content": "[3099 tokens of detailed deliberation]",
"finish_reason": "stop"
}
The model did extensive work — 3099 tokens of reasoning — but got caught in a deliberation loop inside <think> and never produced output in the content field. The server returned finish_reason: "stop" with empty content, reporting success.
This means:
- Every eval harness checking
finish_reason == "stop"silently accepts empty responses - Every agentic framework propagates empty strings downstream
- Every user sees a blank response and concludes the model is broken
- The actual reasoning is trapped in
reasoning_content— the model did real work that nobody sees unless they explicitly check that field
This is server-side, not a UI bug, confirmed by inspecting the raw API response and LM Studio server log. The reasoning_content/content split happens before the response reaches any client.
Bug interaction
These aren't independent issues. They interact to create systemic problems with tool calling and reasoning in LM Studio.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Parallel Agent Orchestrator for Claude Code Using Git Worktrees
A developer built a parallel orchestrator that uses git worktrees to create isolated environments for Claude Code agents, solving the problem of shared working directories causing broken apps and messy git status.

LumaBrowser: Electron Browser Offloads DOM Parsing to Local LLMs for AI Agents
LumaBrowser is an Electron browser that offloads DOM parsing to local LLMs via OpenAI-compatible endpoints, helping autonomous agents avoid processing raw HTML. It uses models like Qwen 2.5 variants to identify UI elements and returns CSS selectors.

Implementing a Local Voice Assistant with Qwen3 on RTX 5060 Ti
A fully local home automation voice assistant using Qwen3 ASR, LLM, and TTS on an RTX 5060 Ti, featuring Morgan Freeman voice cloning and a variety of integration tools.

Persistent Side Panel for Claude Code with Autonomous Content Management
A developer built a TUI panel that sits in an iTerm2 split pane next to the terminal, featuring three fixed panels that Claude autonomously manages to show relevant content like code, diagrams, and status updates.