Threat data from 91K AI agent interactions: Tool abuse up 6.4%, new multimodal attacks

✍️ OpenClawRadar📅 Published: February 24, 2026🔗 Source
Threat data from 91K AI agent interactions: Tool abuse up 6.4%, new multimodal attacks
Ad

Threat landscape from production AI agent data

Real-world threat data from 91,284 AI agent interactions across 47 deployments shows 35,711 threats detected in February 2026. The detection model uses a Gemma-based 5-head multilabel classifier.

Key threats for self-hosted deployments

  • Tool/command abuse: Increased 6.4% to 14.5% of threats. The dominant pattern is tool chain escalation where a harmless read call is followed by a write or execute. Most local setups give agents tool access without sufficient safeguards.
  • Agent goal hijacking: Doubled to 6.9% of threats. Targets the planning phase in autonomous agent loops, particularly relevant for local setups with less monitoring on agent state.
  • RAG poisoning: Shifted to metadata attacks at 12.0% (up from 10.0%). New pattern targets document metadata (titles, authors, annotations) rather than content. Most people sanitize content but pass metadata through as-is.
  • Multimodal injection: New threat at 2.3% where instructions are hidden in images and PDFs. Text-only safety scanning misses these attacks.
Ad

Threat breakdown percentages

  • Data Exfiltration: 18.0% (-1.2 MoM change)
  • Tool/Command Abuse: 14.5% (+6.4)
  • RAG/Context Attack: 12.0% (+2.0)
  • Jailbreak: 11.0% (-1.3)
  • Prompt Injection: 8.1% (-0.7)
  • Agent Goal Hijack: 6.9% (+3.3)
  • Inter-Agent Attack: 5.0% (+1.6)

Detection approach

The detection pipeline uses two layers: L1 is pattern matching with 218 rules (sub-ms latency, runs entirely locally), and L2 is Gemma-based. The full community edition is open source at github.com/raxe-ai/raxe-ce.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also