Open-source playground for red-teaming AI agents with published exploits

What this is
Fabraix Playground is an open-source environment for red-teaming AI agents through adversarial challenges. It started as an internal tool for testing guardrails but was open-sourced to get diverse perspectives on vulnerabilities.
How it works
Each challenge deploys a live AI agent with:
- A specific persona
- A set of real tools (web search, browsing, and more)
- Something it's been instructed to protect
- Fully visible system prompts
The objective is to find ways past the guardrails. When someone succeeds, the winning technique gets published — including approach, reasoning, and full conversation transcripts.
Project structure
/src— React frontend (TypeScript, Vite, Tailwind)/challenges— every challenge config and system prompt, versioned and open- Guardrail evaluation runs server-side to prevent client-side tampering
- The agent runtime is being open-sourced separately
Local development
To run locally:
npm install
npm run devThis connects to the live API by default. To develop against a local backend:
VITE_API_URL=http://localhost:8000/v1 npm run devChallenge examples
The first challenge was to get an agent to call a tool it's been told to never call. Someone succeeded in around 60 seconds without directly asking for the secret. The next challenge focuses on data exfiltration with harder defenses.
The community drives what gets tested: anyone can propose a challenge (scenario, agent, objective), the community votes, and the top-voted challenge goes live with a ticking clock. The fastest successful jailbreak wins.
Technical details
The project is built with TypeScript (76.5%), CSS (22.2%), and other languages (1.3%). It uses MIT license and has a Discord community for discussing techniques and sharing approaches.
📖 Read the full source: HN AI Agents
👀 See Also

Security Warning: ClawProxy Script Stole API Keys, Resulting in Significant OpenRouter Bill
A developer installed a closed-source ClawProxy script from a Reddit user on a sandboxed WSL Ubuntu 24.04 system, which stole their OpenRouter API key and used it via Google Vertex API to run up a large bill on Opus 4.6 overnight.

Claude Code source map leak reveals minified JavaScript was already public on npm
A source map file accidentally included in version 2.1.88 of the @anthropic-ai/claude-code npm package revealed internal developer comments, but the actual 13MB cli.js file containing 148,000+ plaintext strings has been publicly accessible on npm since launch.

LiteLLM v1.82.8 Compromise Uses .pth File for Persistent Execution
LiteLLM v1.82.8 was compromised on PyPI and includes a .pth file that executes arbitrary code on every Python process startup, not just when the library is imported. The payload runs even if LiteLLM is installed as a transitive dependency and never used directly.

Preventing AI Agents from Botnet Participation: Security Considerations
Community discusses how to protect autonomous AI agents from being hijacked or used in malicious botnets.