Be My Butler: Multi-Agent Pipeline for AI Code Verification

✍️ OpenClawRadar📅 Published: March 14, 2026🔗 Source
Be My Butler: Multi-Agent Pipeline for AI Code Verification
Ad

What Be My Butler Does

Be My Butler (BMB) is a multi-agent pipeline designed to solve a specific problem in AI-assisted coding: when AI coding agents incorrectly report their own code as working. The creator, a materials/mechanical engineer with no programming background, built this after experiencing Claude Code agents writing code that passed tests but didn't actually work in practice.

Core Concept

The system implements a peer review model for AI-generated code:

  • One model writes the code
  • A different model reviews it without knowing who wrote it (blind verification)
  • A cross-model council (Claude + GPT + Gemini) votes on whether it actually works
  • An analyst agent tracks patterns in what goes wrong

Performance Metrics

From testing:

  • Single-agent self-review catches ~40% of real issues
  • Cross-model blind review catches ~85%
  • Cost overhead: 15-20% more tokens
Ad

v0.2 Features

  • Analytics dashboard to track token usage and costs
  • Analyst agent for automated code review patterns
  • Consultant agent for architecture decisions
  • Improved tmux-based orchestration

Installation and Usage

Fully open source under MIT license. Installation:

git clone https://github.com/project820/be-my-butler.git
cd be-my-butler && ./install.sh
bmb "build a REST API with auth"

The tool is particularly useful for "vibe coders" — people without traditional coding experience who depend on AI for code quality assessment. When you can't read code to spot issues yourself, having multiple models cross-check each other provides verification that single-agent systems lack.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also

Stockade: A New Orchestration Tool for Claude Code with Channel Support and Security Layers
Tools

Stockade: A New Orchestration Tool for Claude Code with Channel Support and Security Layers

Stockade is an orchestration tool built around Anthropic's Agent SDK that provides channel-based session management, RBAC, and fine-grained permissions for AI agents. It addresses limitations in OpenClaw and NanoClaw by offering more control while maintaining security through containerization and credential proxies.

OpenClawRadar
skill-depot: A Local-First Memory and Skill System for MCP-Compatible AI Agents
Tools

skill-depot: A Local-First Memory and Skill System for MCP-Compatible AI Agents

skill-depot is a retrieval system that stores agent knowledge as Markdown files and uses vector embeddings to semantically search and selectively load only relevant content. It runs 100% locally with no API keys, works with any MCP-compatible agent, and can be set up with npx skill-depot init.

OpenClawRadar
Product Manager Shares 70+ Claude Skills for Automating PM Workflows
Tools

Product Manager Shares 70+ Claude Skills for Automating PM Workflows

A product manager with 20 years experience has created over 70 Claude skills that automate common PM tasks, including PRD generation, user interview analysis, competitive profiling, and roadmap building. The skills are available as downloadable .md files for Claude Code.

OpenClawRadar
civStation: A VLM System for Playing Civilization VI via Natural Language Commands
Tools

civStation: A VLM System for Playing Civilization VI via Natural Language Commands

civStation is a computer-use VLM harness that plays Civilization VI by translating high-level natural language commands into in-game actions. The system uses a 3-layer architecture separating strategy and execution, with support for human-in-the-loop intervention.

OpenClawRadar