Domain-Camouflaged Injection Attacks Evade Detectors in Multi-Agent LLM Systems

A new paper from Aaditya Pai identifies a critical blind spot in LLM injection detectors: domain-camouflaged injection attacks—payloads generated to mimic the vocabulary and authority structures of the target document—systematically evade detection. Standard detectors flag static payloads at high rates but fail against camouflaged ones.
Key Findings
- Detection rate on Llama 3.1 8B: dropped from 93.8% (static) to 9.7% (camouflaged).
- Detection rate on Gemini 2.0 Flash: dropped from 100% to 55.6%.
- Llama Guard 3, a production safety classifier, detected zero camouflaged payloads (IDR = 0.000).
- The Camouflage Detection Gap (CDG) is statistically significant across 45 tasks and three domains (Llama: χ² = 38.03, p < 0.001; Gemini: χ² = 17.05, p < 0.001).
Multi-Agent Debate Amplifies Attacks
Multi-agent debate architectures amplify static injection attacks by up to 9.9x on smaller models. Stronger models show collective resistance. Targeted detector augmentation only partially remediates the gap: 10.2% improvement on Llama, 78.7% on Gemini—indicating the vulnerability is architectural for weaker models.
Framework Released
The authors release their framework, task bank, and payload generator publicly. The blind spot extends beyond few-shot detectors to dedicated safety classifiers, suggesting fundamental weaknesses in current approach.
📖 Read the full source: HN LLM Tools
👀 See Also

Malwar: A Vulnerability Scanner for SKILL.md Files Built with Claude Code
A developer has released Malwar, a free tool that scans SKILL.md files for malicious instructions using a 4-layer pipeline including a rule engine, URL crawler, LLM analysis, and threat intel. The tool was built entirely with Claude Code after the developer found concerning patterns like Base64 blobs and instructions to pipe curl output to bash in existing skills.

Secure Administrator Approval Flow for Group-Chat Assistants Against Prompt Injection
A practical approach to secure LLM assistants in shared group chats: pausing VM, OAuth, and code execution tools until admin approves via a timed link.

Coldkey: Post-Quantum Age Key Generation and Paper Backup Tool
Coldkey generates post-quantum age keys (ML-KEM-768 + X25519) and produces single-page printable HTML backups with QR codes for offline storage.

Agent-Drift Security Tool v0.1.2 Released: A Leap Forward in AI Security
The Agent-Drift Security Tool v0.1.2 is now available, offering enhanced safety features for AI coding agents. This update addresses key security challenges in automation.