Domain-Camouflaged Injection Attacks Evade Detectors in Multi-Agent LLM Systems

✍️ OpenClawRadar📅 Published: May 23, 2026🔗 Source

A new paper from Aaditya Pai identifies a critical blind spot in LLM injection detectors: domain-camouflaged injection attacks—payloads generated to mimic the vocabulary and authority structures of the target document—systematically evade detection. Standard detectors flag static payloads at high rates but fail against camouflaged ones.

Key Findings

Detection rate on Llama 3.1 8B: dropped from 93.8% (static) to 9.7% (camouflaged).
Detection rate on Gemini 2.0 Flash: dropped from 100% to 55.6%.
Llama Guard 3, a production safety classifier, detected zero camouflaged payloads (IDR = 0.000).
The Camouflage Detection Gap (CDG) is statistically significant across 45 tasks and three domains (Llama: χ² = 38.03, p < 0.001; Gemini: χ² = 17.05, p < 0.001).

Multi-Agent Debate Amplifies Attacks

Multi-agent debate architectures amplify static injection attacks by up to 9.9x on smaller models. Stronger models show collective resistance. Targeted detector augmentation only partially remediates the gap: 10.2% improvement on Llama, 78.7% on Gemini—indicating the vulnerability is architectural for weaker models.

Framework Released

The authors release their framework, task bank, and payload generator publicly. The blind spot extends beyond few-shot detectors to dedicated safety classifiers, suggesting fundamental weaknesses in current approach.

📖 Read the full source: HN LLM Tools

👀 See Also

Security

Testing Uncensored Qwen 3.5 35B Models for Cybersecurity Questions

A cybersecurity professional tested three uncensored Qwen 3.5 35B models on hacking and security bypass questions, finding significant differences in response quality compared to the original censored model. The uncensored models consistently provided answers where the original model refused or gave incomplete responses.

Apr 18, 2026, 01:45 AM UTC

OpenClawRadar

Security

AI Is Breaking the Two Vulnerability Cultures: Coordinated Disclosure vs. Linux's "Bugs Are Bugs"

Jeff Kaufman analyzes how AI vulnerability discovery is fracturing both coordinated disclosure and Linux's quiet-fix culture, using the recent Copy Fail (ESP) vulnerability as a case study.

May 8, 2026, 08:15 PM UTC

OpenClawRadar

Security

Declawed: An Advanced Community-Driven Malware Scanner for ClawHub SKILL.md Files

Declawed is a security tool for scanning SKILL.md files on ClawHub, detecting prompt injection, malicious content, and info stealers, utilizing community-driven rulesets.

Feb 13, 2026, 12:45 AM UTC

OpenClawRadar

Security

arifOS: A $15 MCP Governance Kernel for OpenClaw Tool Security

arifOS is a lightweight MCP server that intercepts OpenClaw tool calls, scores them 000-999, and blocks unsafe actions with 13 hard security floors before they reach filesystems, APIs, or databases.

Mar 1, 2026, 03:45 PM UTC

OpenClawRadar