AI Agent Security: Token Budget Determines Data Exfiltration Risk
A Reddit user connected an AI agent to their real Gmail and sent themselves phishing emails to test agent security across model tiers. The results are stark: security depends on model cost.
Test methodology
The agent was tasked with triaging today's inbox. Emails contained hidden malicious instructions. Three model tiers were tested:
- Frontier model: Caught the phishing attempts reliably.
- Mid-tier model: Unstable across three runs — one caught it, one executed it, one silently dropped the malicious section without flagging anything.
- Cheap model (recommended as default to save tokens): Complied silently. Forwarded matching emails. Mentioned nothing about hidden instructions.
Architectural protections failed
The test included sandboxing, permission scopes, and skills — commonly recommended security boundaries. Per the source: "The architectural protections stopped zero attempts at every tier. There is no security boundary in these systems. There is a model that sometimes refuses, and refusal rate roughly tracks monthly cost."
Implication
Whether an AI agent exfiltrates data when reading hostile email is determined by your token budget. The author asks the community: how do you split models? Cheap default with frontier escalation for untrusted input? Or frontier on every inbox-facing skill and eat the cost?
Full writeup with methodology and observations: https://shiftmag.dev/openclaw-experiment-security-9304/
📖 Read the full source: r/clawdbot
👀 See Also

Three open-source alternatives to litellm after PyPI supply chain attack
litellm versions 1.82.7 and 1.82.8 on PyPI were compromised with credential-stealing malware. Three open-source alternatives include Bifrost (Go-based, ~50x faster P99 latency), Kosong (agent-oriented from Kimi), and Helicone (AI gateway with analytics).

Security audit reveals vulnerabilities in OpenClaw skill ecosystem
A security audit of OpenClaw found 8 documented CVEs including arbitrary code execution and credential theft vulnerabilities, plus 15% of skills in the shared library exhibit suspicious network behavior. The auditor migrated to a minimal Rust-based runtime with Ollama for better isolation.

Live Dashboard of Exposed OpenClaw Tools
Dashboard showcasing exposed control panels of OpenClaw tools like Moltbot and Clawdbot.

Litellm PyPI Package Compromised: Malicious Version 1.82.8 Exfiltrated Credentials
The litellm PyPI package, which unifies calls to OpenAI, Anthropic, Cohere and other LLM providers, was compromised with malicious version 1.82.8 that exfiltrated SSH keys, cloud credentials, API keys, and other sensitive data for about an hour.