KnightClaw: Local Security Extension for OpenClaw Agents

KnightClaw is a security extension designed to protect OpenClaw AI coding agents from adversarial prompts. The tool addresses a specific threat model where a single malicious message in the context window can cause an agent to follow attacker instructions instead of user commands.
Core Features
KnightClaw operates as a drop-in extension with no configuration required, no API keys, and no cloud dependency. It intercepts every message before it reaches the agent.
Detection System
The guard uses an 8-layer hybrid detection approach:
- Regex patterns
- Homoglyph detection
- Boundary token analysis
- Perplexity scoring
- Entropy analysis
- Heuristics
- Semantic embeddings (using a local, quantized BGE model)
Blocks occur in microseconds.
Additional Security Measures
- Egress redaction: Strips secrets from outbound responses before they leave the agent
- Hash-chained audit logs: Tamper-proof, append-only logs with full timeline of every block, allow, and config change
- Velocity circuit breaker: 10 blocks in 60 seconds triggers automatic lockdown with no manual intervention
- Kill switch: One command stops everything:
openclaw knight lockdown on
Technical Details
The extension runs entirely local with zero telemetry and is MIT licensed. The source is available for testing and contribution.
📖 Read the full source: r/openclaw
👀 See Also

Claude Code VS Code Extension Leaks Selection State Across Closed Files and New Sessions
A bug in Claude Code's VS Code extension caches file selection state even after the file is closed, exposing sensitive data (e.g., Supabase service-role keys) to a brand new CLI session. Full repro steps and GitHub issue #58886.

LLM-Assisted Exploit: Anthropic's Mythos Preview Helped Build First Public macOS Kernel Exploit on Apple M5 in Five Days
Using Anthropic's Mythos Preview, security firm Calif built the first public macOS kernel memory corruption exploit on Apple's M5 silicon in five days—breaking MIE hardware security that took Apple five years to develop.

Practical Security Practices for OpenClaw Agents
A Reddit post outlines specific security practices for OpenClaw users, including scheduled commands for updates and audits, managing agent access in shared channels, and securing API keys and skills.

Preventing AI Agents from Botnet Participation: Security Considerations
Community discusses how to protect autonomous AI agents from being hijacked or used in malicious botnets.