Be My Butler: Multi-Agent Pipeline for AI Code Verification

What Be My Butler Does
Be My Butler (BMB) is a multi-agent pipeline designed to solve a specific problem in AI-assisted coding: when AI coding agents incorrectly report their own code as working. The creator, a materials/mechanical engineer with no programming background, built this after experiencing Claude Code agents writing code that passed tests but didn't actually work in practice.
Core Concept
The system implements a peer review model for AI-generated code:
- One model writes the code
- A different model reviews it without knowing who wrote it (blind verification)
- A cross-model council (Claude + GPT + Gemini) votes on whether it actually works
- An analyst agent tracks patterns in what goes wrong
Performance Metrics
From testing:
- Single-agent self-review catches ~40% of real issues
- Cross-model blind review catches ~85%
- Cost overhead: 15-20% more tokens
v0.2 Features
- Analytics dashboard to track token usage and costs
- Analyst agent for automated code review patterns
- Consultant agent for architecture decisions
- Improved tmux-based orchestration
Installation and Usage
Fully open source under MIT license. Installation:
git clone https://github.com/project820/be-my-butler.git
cd be-my-butler && ./install.sh
bmb "build a REST API with auth"The tool is particularly useful for "vibe coders" — people without traditional coding experience who depend on AI for code quality assessment. When you can't read code to spot issues yourself, having multiple models cross-check each other provides verification that single-agent systems lack.
📖 Read the full source: r/ClaudeAI
👀 See Also

Miasma: A tool to trap AI web scrapers with poisoned data
Miasma is a server tool that sends poisoned training data and self-referential links to AI web scrapers, creating an endless loop. It runs with minimal memory footprint and can be configured via CLI options including port, host, and link prefix.

Open Design: Open-Source Alternative to Claude Design Runs on Your Local CLI Agents
Open Design is a local-first, BYOK design engine that turns 11 coding-agent CLIs (Claude Code, Codex, Cursor, Gemini CLI, etc.) into a design workflow with 72 brand-grade design systems and 31 composable skills, exporting HTML/PDF/PPTX/MP4.

Real-Time Desktop Overlay for Monitoring Claude Code Usage Limits
The open-source desktop overlay displays Claude Code usage limits in real-time, eliminating the need to repeatedly type '/usage'.

CC-Canary: Detect Regressions in Claude Code with Local JSONL Analysis
CC-Canary reads Claude Code session logs and produces a forensic report on model drift, including read:edit ratio, reasoning loops, cost trends, and auto-detected inflection dates.