KnightClaw: Local Security Extension for OpenClaw Agents

✍️ OpenClawRadar📅 Published: February 23, 2026🔗 Source

KnightClaw is a security extension designed to protect OpenClaw AI coding agents from adversarial prompts. The tool addresses a specific threat model where a single malicious message in the context window can cause an agent to follow attacker instructions instead of user commands.

Core Features

KnightClaw operates as a drop-in extension with no configuration required, no API keys, and no cloud dependency. It intercepts every message before it reaches the agent.

Detection System

The guard uses an 8-layer hybrid detection approach:

Regex patterns
Homoglyph detection
Boundary token analysis
Perplexity scoring
Entropy analysis
Heuristics
Semantic embeddings (using a local, quantized BGE model)

Blocks occur in microseconds.

Additional Security Measures

Egress redaction: Strips secrets from outbound responses before they leave the agent
Hash-chained audit logs: Tamper-proof, append-only logs with full timeline of every block, allow, and config change
Velocity circuit breaker: 10 blocks in 60 seconds triggers automatic lockdown with no manual intervention
Kill switch: One command stops everything: openclaw knight lockdown on

Technical Details

The extension runs entirely local with zero telemetry and is MIT licensed. The source is available for testing and contribution.

📖 Read the full source: r/openclaw

👀 See Also

Security

Three Email-Based Attack Vectors Against AI Agents That Read Email

A Reddit post details three specific methods attackers can use to hijack AI agents that process email: Instruction Override, Data Exfiltration, and Token Smuggling. These exploit the agent's inability to distinguish legitimate instructions from malicious ones embedded in email text.

Mar 12, 2026, 06:45 PM UTC

OpenClawRadar

Security

OpenClaw Security Audit Command Prompts Plain-English Vulnerability Reports

A Reddit user shared a prompt for the OpenClaw CLI that runs a deep security audit and outputs findings in plain English, specifying what's exposed, severity scores, and exact config fixes.

Mar 8, 2026, 05:45 PM UTC

OpenClawRadar

Security

Sweden's E-Government Platform Source Code Leaked via Compromised CGI Infrastructure

The full source code of Sweden's E-Government platform was leaked by threat actor ByteToBreach after compromising CGI Sverige AB infrastructure. The leak includes staff databases, API document signing systems, Jenkins SSH credentials, and RCE test endpoints.

Mar 13, 2026, 02:45 PM UTC

OpenClawRadar

Security

13 Words on Reddit Can Manipulate AI Search: Cornell Research

Cornell research shows that a 13-word snippet on Reddit or Wikipedia can reliably poison AI search agents. Half of all AI citations come from UGC sites, making it trivially easy for brands to inject promotional content.

Jun 28, 2026, 12:19 PM UTC

OpenClawRadar