Benchmark Results for Small Local and OpenRouter Models on Agentic Text-to-SQL Task

✍️ OpenClawRadar📅 Published: April 17, 2026🔗 Source

A developer has published benchmark results for small local and OpenRouter models on an agentic text-to-SQL task. The benchmark takes English queries like "Show order lines, revenue, units sold, revenue per unit (total revenue ÷ total units sold), average list price per product in the subcategory, gross profit, and margin percentage for each product subcategory" and converts them to SQL that is tested against database tables.

Benchmark Details

The agent can see query results and modify SQL to fix issues, with a limit on debugging rounds. The benchmark is deliberately short with 25 questions and runs in much less than 5 minutes for most models, making it practical for testing different configurations. It's designed to be tough enough to separate the best models from others.

Key Findings

The best open models identified were kimi-k2.5, Qwen 3.5 397B-A17B, and Qwen 3.5 27B
NVIDIA Nemotron-Cascade-2-30B-A3B outscores Qwen 3.5-35B-A3B and matches Codex 5.3
Mimo v2 Flash was described as "a gem of a model"

Self-Hosted Option

The benchmark now includes the ability to run it yourself against your own server using the WASM version of Llama.cpp. The developer is seeking feedback on what to change for version 2 and wants to see scores others get with different configurations.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

SpecLock: Open Source Constraint Engine for AI Coding Agents

SpecLock is an MCP server that actively enforces constraints on AI coding agents like Claude Code. It blocks violations with semantic conflict warnings using synonym expansion, negation detection, and destructive action flagging.

Feb 28, 2026, 04:45 AM UTC

OpenClawRadar

Tools

Custom status line for Claude Code shows context usage, rate limits, and token counts at a glance

A custom script adds a persistent status line to Claude Code, displaying context %, 5-hour rate limit %, KV cache reads, cumulative input/output tokens, model name, and working directory — color-coded for dark terminals.

May 5, 2026, 04:24 PM UTC

OpenClawRadar

Tools

Claude Code Skills for Automated Project Scaffolding

A developer has built Claude Code skills that automate full-stack project setup with commands for React, Next.js, Node.js APIs, and Turborepo monorepos. The skills pull latest dependencies, support 50+ integrations, and are MIT licensed.

Apr 16, 2026, 10:45 AM UTC

OpenClawRadar

Tools

Skill Seekers v3.2.0 adds YouTube tutorial extraction for Claude skills

Skill Seekers v3.2.0 now extracts content from YouTube tutorials to create structured SKILL.md files for Claude. The tool uses a two-pass AI enhancement workflow to clean OCR output and generate usable documentation from video content.

Mar 1, 2026, 09:45 PM UTC

OpenClawRadar