Benchmark Results: GitHub CLI vs MCP Approaches for AI Agents

Benchmark Results: GitHub CLI vs MCP Approaches
A Reddit user conducted an independent study comparing different methods for exposing GitHub tools to AI agents. The benchmark tested four approaches: GitHub CLI, MCP (Model Context Protocol), MCP with Tool Search, and MCP with Code Mode, using real data and practical tasks.
Key Findings
- GitHub MCP is 2–3x more expensive to use than GitHub CLI. The source notes there's "almost no practical reason to use their MCP except for some of the different handling of security."
- Tool Search saves upfront tokens but spends them on extra turns. Whether this trade-off pays depends on task complexity. Tool Search also introduces a new failure mode due to imperfect search accuracy.
- Code Mode is the cheapest way to use MCP, but still 2x more expensive than CLI, and it's very slow. Code Mode introduces a unique failure mode when the agent writes buggy code or poor error handling.
- The benchmark suggests it's possible to push CLIs further toward higher success rates at lowest cost and latency with a principled design approach that treats agent ergonomics as a first-class concern.
Open Source Resources
The author has detailed their approach at https://axi.md and open-sourced the benchmark harness, results, and reference implementation of gh-axi at https://github.com/kunchenguid/axi.
📖 Read the full source: r/ClaudeAI
👀 See Also

OpenClaw Skill Connects Agents to Knods.io UI for Workflow Creation
A developer has built an OpenClaw skill that enables agents to understand and create workflows within the Knods.io UI, allowing users to switch between specific agents like brand-specific ones instead of relying on Knods' built-in agent.

LivingAgents.ai: A Web-Based AI Agent Simulation Using Claude API
LivingAgents.ai is a web-based simulation where every agent is powered by the Claude API, performing actions like foraging, trading, crafting, attacking, reproducing, and dying permanently, with each action requiring a real LLM call.

rawq: Local CLI Tool for AI Agent Semantic Code Search
rawq is an open-source CLI tool that helps AI agents find relevant code using semantic search with a 33MB local model via ONNX runtime and BM25 lexical search via tantivy. In testing, AI agents using rawq consumed 4x fewer tokens and completed tasks 2x faster compared to blind read/grep tools.

Claude Code skill generates App Store screenshots using Gemini AI
A new Claude Code skill called /aso-cosmicmeta-ss creates App Store and Google Play screenshots through a 6-phase workflow that analyzes codebases and uses Gemini AI for enhancement. The skill includes an approval gate to catch layout issues before using API credits.