Needle: A 26M Parameter Tool-Calling Model Built Entirely Without FFNs

✍️ OpenClawRadar📅 Published: May 12, 2026🔗 Source
Ad

Needle is a 26M parameter model designed specifically for single-shot function calling. It uses cross-attention and gating layers with zero FFNs, based on the insight that tool calling is retrieval-and-assembly (match query to tool name, extract argument values, emit JSON) rather than reasoning. The model runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.

Training Details

  • Pretrained on 200B tokens across 16 TPU v6e (27 hours)
  • Post-trained on 2B tokens of synthesized function-calling data (45 minutes)
  • Data synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)

Architecture: Simple Attention Networks

The entire model is just attention and gating — no MLPs anywhere. The authors argue that FFN parameters are wasted at this scale for tool calling, and that the 'no FFN' finding generalizes to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input.

Ad

Benchmarks

Needle beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, and LFM2.5-350M on single-shot function calling, though those models have more capacity for conversational settings.

How to Use

# Test the model via the playground or finetune on your Mac/PC
git clone https://github.com/cactus-compute/needle

Everything is MIT licensed.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

SIDJUA v0.9.7: Open Source Multi-Agent AI with Pre-Action Governance Enforcement
Tools

SIDJUA v0.9.7: Open Source Multi-Agent AI with Pre-Action Governance Enforcement

SIDJUA v0.9.7 is a self-hosted, open source multi-agent AI framework that enforces governance rules before agents act, blocking unauthorized actions like budget overruns or scope violations. It supports multiple LLM providers, runs on 4GB RAM, and includes a desktop GUI built with Tauri v2.

OpenClawRadar
Paper Lantern MCP Server Connects Claude Code to Research Papers
Tools

Paper Lantern MCP Server Connects Claude Code to Research Papers

Paper Lantern is an MCP server built with Claude Code that connects coding agents to over 2 million CS and 43 million biomedical research papers, enabling them to find benchmarked methods instead of defaulting to training data.

OpenClawRadar
nervx: CLI tool reduces Claude Code token usage by analyzing codebase structure
Tools

nervx: CLI tool reduces Claude Code token usage by analyzing codebase structure

nervx is a pip-installable CLI tool that parses repositories with tree-sitter, builds a SQLite graph of functions and imports, and generates a NERVX.md structural map. It automatically adds instructions to CLAUDE.md that teach Claude to use nervx navigation, reducing grep searches by 65% and output tokens by 48% in testing.

OpenClawRadar
Governor: A Claude Code Plugin to Cut Token Waste via Output Compression, Context Slimming, and Tool Filtering
Tools

Governor: A Claude Code Plugin to Cut Token Waste via Output Compression, Context Slimming, and Tool Filtering

Governor is a Claude Code plugin that reduces token/context waste through compact professional output, memory file compression, tool-output filtering, and drift guardrails. Benchmarks show 55.5% output token savings vs control.

OpenClawRadar