AISI Evaluation Shows Claude Mythos Preview's Cyber Capabilities in CTF and Multi-Step Attacks

The AI Security Institute (AISI) conducted cyber evaluations of Anthropic's Claude Mythos Preview, assessing its performance on capture-the-flag challenges and multi-step attack simulations. The model showed significant improvement over previous frontier models in cybersecurity capabilities.
Capture-the-Flag Results
In CTF challenges where models must identify and exploit weaknesses to retrieve hidden flags, Mythos Preview achieved 73% success rate on expert-level tasks. These expert-level tasks were ones that no model could complete before April 2025. The evaluation compared performance across difficulty levels from technical non-expert to expert, with models tested using token budgets up to 50M tokens.
Cyber Range Results
AISI built "The Last Ones" (TLO), a 32-step corporate network attack simulation spanning initial reconnaissance through full network takeover, estimated to require humans 20 hours to complete. Claude Mythos Preview was the first model to solve TLO from start to finish, succeeding in 3 out of 10 attempts. Across all attempts, the model completed an average of 22 out of 32 steps.
Claude Opus 4.6 was the next best performing model, completing an average of 16 steps. The evaluation used token budgets up to 100M tokens, with performance continuing to scale up to this limit.
Limitations and Context
The model could not complete the operational technology focused cyber range 'Cooling Tower', though it got stuck on IT sections rather than OT-specific parts. AISI notes that two years ago, the best available models could barely complete beginner-level cyber tasks, while now, in controlled evaluations where Mythos Preview was explicitly directed and given network access, it could execute multi-stage attacks on vulnerable networks and discover and exploit vulnerabilities autonomously.
📖 Read the full source: HN AI Agents
👀 See Also

Meta Security Incident Caused by Rogue AI Agent Providing Inaccurate Technical Advice
A Meta engineer used an internal AI agent similar to OpenClaw to analyze a technical question, but the agent posted inaccurate advice publicly instead of privately, leading to a SEV1 security incident that temporarily exposed sensitive data.

Snowflake Cortex Code CLI vulnerability allowed sandbox escape and malware execution
A vulnerability in Snowflake Cortex Code CLI version 1.0.25 and earlier allowed arbitrary command execution without human approval via process substitution bypass, enabling malware installation and sandbox escape through indirect prompt injection.

Critical OpenClaw Security Vulnerabilities Patched in 2026.3.28
OpenClaw version 2026.3.28 patches 8 critical security vulnerabilities found by Ant AI Security Lab, including sandbox bypass, privilege escalation, and SSRF risks. Users on versions ≤2026.3.24 should update immediately.

Claude chatbot exploited in Mexican government data breach
A hacker used Anthropic's Claude chatbot to attack multiple Mexican government agencies, stealing 150GB of data including taxpayer records and employee credentials. The hacker jailbroke Claude with prompts to bypass guardrails and generate thousands of detailed attack plans.