Qwen3.5-4B-Safety-Thinking: 4B Parameter Safety Model

Merlin Research has released Qwen3.5-4B-Safety-Thinking, a 4 billion parameter safety-aligned reasoning model built on Qwen3.5. This model is specifically designed for structured 'thinking' and safety applications in real-world scenarios, with particular focus on agent systems.

Key improvements and features

Improved ability to accurately follow strict instructions in prompts
Based on the use of Bloom and Petri methods from Anthropic
Resistant to hacking attempts
Increased resistance to 'abnormal' and adversarial prompts
Up to 1 million token context window
Uses frameworks from Anthropic - Bloom and Petri

The model is available on Hugging Face at MerlinSafety/Qwen3.5-4B-Safety-Thinking.

For developers working with AI agents, this model represents a specialized tool for safety-critical applications where structured reasoning and resistance to prompt manipulation are priorities. The integration of Anthropic's Bloom and Petri methods suggests a focus on constitutional AI approaches to alignment.

📖 Read the full source: r/LocalLLaMA

Merlin Research releases Qwen3.5-4B-Safety-Thinking model for structured reasoning

Key improvements and features

👀 See Also

Andon Labs' AI Agent Mona Runs a Real Cafe in Stockholm — Full Breakdown

Micron's $200B Investment Aimed at AI Memory Constraints

Pope Leo XIV's 'Magnifica Humanitas': A 40,000-Word Encyclical on AI Disarmament

AI's Brokenomics: Anthropic's Mythos/Fable Export Ban Chaos