Threat data from 91K AI agent interactions: Tool abuse up 6.4%, new multimodal attacks

Threat landscape from production AI agent data
Real-world threat data from 91,284 AI agent interactions across 47 deployments shows 35,711 threats detected in February 2026. The detection model uses a Gemma-based 5-head multilabel classifier.
Key threats for self-hosted deployments
- Tool/command abuse: Increased 6.4% to 14.5% of threats. The dominant pattern is tool chain escalation where a harmless read call is followed by a write or execute. Most local setups give agents tool access without sufficient safeguards.
- Agent goal hijacking: Doubled to 6.9% of threats. Targets the planning phase in autonomous agent loops, particularly relevant for local setups with less monitoring on agent state.
- RAG poisoning: Shifted to metadata attacks at 12.0% (up from 10.0%). New pattern targets document metadata (titles, authors, annotations) rather than content. Most people sanitize content but pass metadata through as-is.
- Multimodal injection: New threat at 2.3% where instructions are hidden in images and PDFs. Text-only safety scanning misses these attacks.
Threat breakdown percentages
- Data Exfiltration: 18.0% (-1.2 MoM change)
- Tool/Command Abuse: 14.5% (+6.4)
- RAG/Context Attack: 12.0% (+2.0)
- Jailbreak: 11.0% (-1.3)
- Prompt Injection: 8.1% (-0.7)
- Agent Goal Hijack: 6.9% (+3.3)
- Inter-Agent Attack: 5.0% (+1.6)
Detection approach
The detection pipeline uses two layers: L1 is pattern matching with 218 rules (sub-ms latency, runs entirely locally), and L2 is Gemma-based. The full community edition is open source at github.com/raxe-ai/raxe-ce.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Reddit user reports OpenClaw VM persistence and suspicious activity
A Reddit user reports their OpenClaw virtual machine automatically restarting after being closed and exhibiting suspicious behavior including opening Microsoft Store and attempting to download questionable files.

Securely Self-Host OpenClaw on a VPS with Tailscale and More
Set up OpenClaw securely on a VPS using Tailscale, fail2ban, UFW, and more, avoiding public exposure and strengthening defense.

Malware Found in OpenClaw Community Skills — Crypto Theft Alert

OpenClaw Skill Analyzer: Static Security Scanner for AI Agent Skills
A developer built a static analyzer that scans OpenClaw skills for security risks before installation, with 40+ detection rules across 12 categories including prompt injection and data exfiltration.