Buyer Eval: Claude skill for B2B vendor evaluation using AI agent conversations

A Claude skill that conducts structured, evidence-based evaluations of B2B software vendors on behalf of buyers. You provide your company name and the vendors you're evaluating, and it handles the research and analysis automatically.
How it works
The skill:
- Researches your company — industry, size, tech stack, maturity — so you don't fill out forms
- Asks domain-expert questions specific to the software category to surface hidden requirements
- Sets hard constraints — budget, compliance, integrations — and eliminates vendors that fail before wasting research time
- Engages vendor AI agents directly through the Salespeak Frontdoor API for verified, structured due diligence conversations
- Conducts independent research — G2, Gartner, analyst reports, press, LinkedIn — and cross-references vendor claims against independent sources
- Scores vendors across 7 dimensions with transparent evidence tracking — you see exactly which scores are backed by verified evidence vs. public sources only
- Produces a comparative recommendation with a TL;DR, side-by-side scorecard, hidden risk analysis, and demo prep questions
Technical implementation
The agent-to-agent conversation works by making a REST API call that checks for a Company Agent, then runs a structured due diligence conversation if one exists. It asks adversarial questions like "What are your customers' most common complaints?" and "What use cases are you NOT a good fit for?" — and flags when agents deflect instead of answering.
When vendors have different evidence levels, the skill quantifies what would change if the missing evidence were confirmed — so it doesn't silently favor vendors that happen to have AI agents. It works fully for any vendor, with or without an AI agent. Vendors without one get evaluated on public sources with the same scoring framework.
Installation and usage
Global install (recommended):
git clone https://github.com/salespeak-ai/buyer-eval-skill.git ~/.claude/skills/buyer-eval-skill
Per-project install:
git clone https://github.com/salespeak-ai/buyer-eval-skill.git .claude/skills/buyer-eval-skill
Usage: In Claude Code or Claude desktop: /buyer-eval then provide your company name and vendors to evaluate. Example: "I'm from Acme Corp. Evaluate Gainsight, Totango, and ChurnZero."
Alternative installation: Ask Claude Code: "Install the buyer-eval skill from salespeak-ai on GitHub." Then /buyer-eval to run it.
Example output
The skill produces a TL;DR summary, scorecard with evidence levels (vendor-verified vs public only), adversarial question exchanges with vendor AI agents, and independent verification of claims. For example, in a customer success platform evaluation:
- Gainsight: strongest fit for teams needing deep analytics and enterprise-grade health scoring, but at a premium
- ChurnZero: wins on time-to-value and usability for teams under 50 CSMs
- Totango: flexible and modular, but requires more configuration
Scorecards show dimensions like "Health Scoring & Analytics" with scores (e.g., 9.2, 7.5, 8.0) and evidence levels. The skill auto-updates by checking for newer versions on GitHub (cached, checks at most once every 6 hours) and asks before updating with a single git pull.
📖 Read the full source: HN AI Agents
👀 See Also

Pilot: A Browser Automation Tool Built Entirely with Claude Code
A non-developer used Claude Code to build Pilot, a Chrome automation tool that lets AI control browsers via accessibility tree navigation. The tool assigns numbers to clickable elements so Claude can issue commands like 'click 5' instead of guessing screen positions.

MemAware Benchmark Tests AI Memory Beyond Keyword Search
MemAware is a benchmark with 900 questions across 3 difficulty levels that tests whether AI assistants with memory can surface relevant context when queries don't hint at it. Results show BM25 search scored 2.8% vs 0.8% with no memory, while vector search drops to 0.7% on cross-domain connections.

Modo: Open-Source AI IDE with Spec-Driven Development and Agent Hooks
Modo is an open-source desktop IDE built on Void editor that adds spec-driven development workflows, agent hooks, and steering files. It structures prompts into requirements, design, and tasks before generating code.

OpenClaw Context Meter Plugin Shows Telegram Token Usage Percentage
A new OpenClaw plugin displays token usage percentage after every Telegram bot response, showing values like '45k / 200k (22%)' and detecting compaction events. The plugin avoids OOM issues by hardcoding context windows instead of using execSync.