Bernstein: A Kubernetes-like orchestrator for AI coding agents with verification and model policies

Bernstein is an orchestrator for AI coding agents that the creator describes as "Kubernetes for coding agents." Unlike simpler tools that spawn agents in parallel worktrees, Bernstein addresses what the developer calls "the other 95%" of the problem.
Key Features
The system includes several critical components:
- Verification: A "janitor" component independently verifies agent outputs after every task. It runs tests, checks diffs, and lints output because "agents lie" - they may claim tests pass when they don't or say they committed files when they didn't.
- Model Policy Engine: Provides allow/deny lists per provider, data residency constraints, preferred routing, and cost ceilings. The creator compares this to "K8s network policies but for LLM providers."
- Deterministic Scheduling: Uses pure Python for scheduling instead of LLMs, creating deterministic control flow with zero LLM tokens spent on coordination. An epsilon-greedy bandit learns routing over time.
- Agent-Agnostic Design: Includes 13 adapters for Claude Code, Codex, Gemini CLI, Cursor, Qwen, Aider, Amp, Roo Code, Goose, Kilo, Kiro, OpenCode, and generic agents. Claude Code has the deepest integration.
- Scale Features: At 500K+ lines and ~5000 tests, Bernstein includes circuit breakers, cost anomaly detection, loop detection, deadlock detection, PII scanning, HMAC-chained audit logs, progressive permissions, and quarantine for suspicious output.
- Self-Development: Can develop itself using
bernstein --evolve.
Technical Details
The creator notes that spawning agents in worktrees is "the hello world of this space" and that most multi-agent frameworks use an LLM to schedule other LLMs, which is "slow, expensive, and non-deterministic." Bernstein's approach uses pure Python for deterministic control flow.
The project has been tested at scale with 500K+ lines of code and approximately 5000 tests. The developer built features like circuit breakers and anomaly detection because "things broke and these were the fixes."
The creator is a solo developer from Israel who mentions "building under rockets (literally)" and that the project has outgrown them, seeking contributors.
📖 Read the full source: r/ClaudeAI
👀 See Also

Visual Prompting Framework Replaces Text Prompts with Single Image for Claude AI
The Carrying Capacity Principle v9 is a bidirectional structural framework that uses a single flowchart image instead of text prompts for Claude AI. It provides structural diagnosis or generative construction plans based on system parameters or goals.

Claude Code Limiter: Self-Hosted Rate Limiter for Shared Claude Code Subscriptions
claude-code-limiter is a self-hostable tool that adds per-user rate limits to shared Claude Code subscriptions, featuring per-model quotas, credit budgets, sliding 24h windows, time-of-day rules, and a real-time dashboard.

GrapeRoot MCP Tool Reduces Claude Code Token Usage by 50-70%
A developer built GrapeRoot, an MCP tool using Claude Code, that tracks explored files and avoids re-reading unchanged content, reducing token usage by 50-70% and making $20 Claude Code plans last 2-3× longer.

Codiff v0.1.0: A Local Diff Viewer for LLM-Generated Code Reviews
Codiff v0.1.0 is a fast, minimal desktop app for reviewing local Git diffs, with LLM walkthrough mode and inline comments that can be copied as Markdown.