Open-source playground for red-teaming AI agents with published exploits

✍️ OpenClawRadar📅 Published: March 16, 2026🔗 Source

What this is

Fabraix Playground is an open-source environment for red-teaming AI agents through adversarial challenges. It started as an internal tool for testing guardrails but was open-sourced to get diverse perspectives on vulnerabilities.

How it works

Each challenge deploys a live AI agent with:

A specific persona
A set of real tools (web search, browsing, and more)
Something it's been instructed to protect
Fully visible system prompts

The objective is to find ways past the guardrails. When someone succeeds, the winning technique gets published — including approach, reasoning, and full conversation transcripts.

Project structure

/src — React frontend (TypeScript, Vite, Tailwind)
/challenges — every challenge config and system prompt, versioned and open
Guardrail evaluation runs server-side to prevent client-side tampering
The agent runtime is being open-sourced separately

Local development

To run locally:

npm install
npm run dev

This connects to the live API by default. To develop against a local backend:

VITE_API_URL=http://localhost:8000/v1 npm run dev

Challenge examples

The first challenge was to get an agent to call a tool it's been told to never call. Someone succeeded in around 60 seconds without directly asking for the secret. The next challenge focuses on data exfiltration with harder defenses.

The community drives what gets tested: anyone can propose a challenge (scenario, agent, objective), the community votes, and the top-voted challenge goes live with a ticking clock. The fastest successful jailbreak wins.

Technical details

The project is built with TypeScript (76.5%), CSS (22.2%), and other languages (1.3%). It uses MIT license and has a Discord community for discussing techniques and sharing approaches.

📖 Read the full source: HN AI Agents

👀 See Also

Security

OpenClaw's 'Allow Always' Feature Security Flaws and Safer Alternatives

OpenClaw's 'allow always' approval feature has been the subject of two CVEs this month, allowing unauthorized command execution through wrapper command binding and shell line-continuation bypasses. The deeper issue is how the feature trains users to stop paying attention to security prompts.

Apr 2, 2026, 07:45 AM UTC

OpenClawRadar

Security

Coldkey: Post-Quantum Age Key Generation and Paper Backup Tool

Coldkey generates post-quantum age keys (ML-KEM-768 + X25519) and produces single-page printable HTML backups with QR codes for offline storage.

May 15, 2026, 10:17 AM UTC

OpenClawRadar

Security

Microsoft Hacked: Malware Planted in GitHub Repos Targets Claude and Gemini Users

Microsoft shut down 70+ GitHub repositories after hackers planted credential-stealing malware targeting AI coding agents like Claude Code and Gemini CLI.

Jun 9, 2026, 12:15 AM UTC

OpenClawRadar

Security

Security scan reveals high severity finding in AI agent find-skills tool

A developer running a security scan on their AI agent setup discovered a high severity vulnerability in the find-skills tool they used to install additional skills, raising concerns about ecosystem safety.

Mar 11, 2026, 11:45 PM UTC

OpenClawRadar