Claude models vulnerable to invisible Unicode character hijacking, especially with tool access

Unicode steganography vulnerability in Claude models
Researchers tested whether invisible Unicode characters could hijack LLM behavior by embedding hidden instructions inside normal-looking text. The study evaluated 8,308 graded outputs across GPT-5.2, GPT-4o-mini, and three Claude models: Opus 4, Sonnet 4, and Haiku 4.5.
Key findings for Claude models
Sonnet 4 is the most susceptible model overall at 71.2% compliance with tools enabled. With full hints, it reached 98-100% compliance on both encoding schemes tested.
Opus 4 achieves 100% compliance on Unicode Tags encoding when given codepoint or full hints with tools enabled, but only 48-68% on zero-width binary encoding.
Haiku 4.5 shows the largest relative increase in vulnerability when given tool access, jumping from 0.8% to 49.2% compliance (odds ratio 115).
Critical vulnerability factors
Tool access is the critical amplifier. Without tools, all Claude models stay below 17% compliance. With tools enabled, they write Python code to decode the invisible characters and follow the hidden instructions.
Encoding preference patterns: Anthropic models strongly prefer Unicode Tags encoding over zero-width binary, while OpenAI models show the opposite pattern.
Injection framing effects: Adding "Ignore all previous instructions" actually reduces compliance for Opus (from 100% to lower levels) but paradoxically increases it for Sonnet (from 43.7% to 59.6%).
Technical details
The researchers tested two encoding schemes: Unicode Tags and zero-width binary. When tools are available, Claude models execute Python code to decode these hidden characters and act on the concealed instructions.
This type of attack represents a form of steganography where malicious instructions are hidden within seemingly benign text using invisible Unicode characters that are not visible to human readers but can be detected and processed by the models.
📖 Read the full source: r/ClaudeAI
👀 See Also

Proxy-layer isolation for local agent API key security
A developer shares an approach to API key isolation in local agent setups using a Rust proxy that swaps placeholder tokens for real credentials, preventing exposure in agent memory, logs, context windows, and tool environments.

Coldkey: Post-Quantum Age Key Generation and Paper Backup Tool
Coldkey generates post-quantum age keys (ML-KEM-768 + X25519) and produces single-page printable HTML backups with QR codes for offline storage.

Monitoring OpenClaw Commands with Python and Gemini Flash for Security
A user created a Python script that trails commands injected by OpenClaw, analyzes them with Gemini Flash, and sends notifications via Discord webhook for alarming or irregular activity, costing about $0.14 daily.

Roblox cheat and AI tool caused Vercel platform outage
A Roblox cheat combined with an AI tool reportedly caused a complete platform outage for Vercel, generating significant discussion on Hacker News with 66 points and 24 comments.