Open-source playground for red-teaming AI agents with published exploits

✍️ OpenClawRadar📅 Published: March 16, 2026🔗 Source
Open-source playground for red-teaming AI agents with published exploits
Ad

What this is

Fabraix Playground is an open-source environment for red-teaming AI agents through adversarial challenges. It started as an internal tool for testing guardrails but was open-sourced to get diverse perspectives on vulnerabilities.

How it works

Each challenge deploys a live AI agent with:

  • A specific persona
  • A set of real tools (web search, browsing, and more)
  • Something it's been instructed to protect
  • Fully visible system prompts

The objective is to find ways past the guardrails. When someone succeeds, the winning technique gets published — including approach, reasoning, and full conversation transcripts.

Project structure

  • /src — React frontend (TypeScript, Vite, Tailwind)
  • /challenges — every challenge config and system prompt, versioned and open
  • Guardrail evaluation runs server-side to prevent client-side tampering
  • The agent runtime is being open-sourced separately
Ad

Local development

To run locally:

npm install
npm run dev

This connects to the live API by default. To develop against a local backend:

VITE_API_URL=http://localhost:8000/v1 npm run dev

Challenge examples

The first challenge was to get an agent to call a tool it's been told to never call. Someone succeeded in around 60 seconds without directly asking for the secret. The next challenge focuses on data exfiltration with harder defenses.

The community drives what gets tested: anyone can propose a challenge (scenario, agent, objective), the community votes, and the top-voted challenge goes live with a ticking clock. The fastest successful jailbreak wins.

Technical details

The project is built with TypeScript (76.5%), CSS (22.2%), and other languages (1.3%). It uses MIT license and has a Discord community for discussing techniques and sharing approaches.

📖 Read the full source: HN AI Agents

Ad

👀 See Also