Needle: A 26M Parameter Function-Calling Model That Runs at 6000 tok/s on Mobile
Cactus has open-sourced Needle, a 26M parameter function-calling model designed to run on budget phones, watches, and glasses. It achieves 6000 tok/s prefill and 1200 tok/s decode on consumer devices using their custom inference engine, Cactus.
Architecture: Simple Attention Networks
Needle uses a Simple Attention Network — no MLPs anywhere. The entire model consists of attention and gating layers. Key design: d=512, 8H/4KV, BPE=8192, with an encoder-decoder structure (12 encoder layers, 8 decoder layers) using cross-attention, masked self-attention with RoPE, and tied embeddings.
Training Details
- Pretrained on 200B tokens across 16 TPU v6e (27 hours)
- Post-trained on 2B tokens of synthesized function-calling data (45 minutes)
- Data synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)
Benchmark Results
Needle beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, and LFM2.5-350M on single-shot function calling. However, those models have more scope/capacity and excel in conversational settings.
Quickstart
git clone https://github.com/cactus-compute/needle.git
cd needle && source ./setup
needle playgroundOpens a web UI at http://127.0.0.1:7860 for testing and fine-tuning on your own tools.
Usage (Python)
from needle import SimpleAttentionNetwork, load_checkpoint, generate, get_tokenizer
params, config = load_checkpoint("checkpoints/needle.pkl")
model = SimpleAttentionNetwork(config)
tokenizer = get_tokenizer()
result = generate(
model, params, tokenizer,
query="What's the weather in San Francisco?",
tools='[{"name":"get_weather","parameters":{"location":"string"}}]',
stream=False
)
print(result)
[{"name":"get_weather","arguments":{"location":"San Francisco"}}]
Fine-tuning Locally
# via playground (auto-generates data via Gemini)
needle playground
or provide your own data
needle finetune data.jsonl
Availability
Weights are on Hugging Face: Cactus-Compute/needle. Everything is MIT licensed.
📖 Read the full source: HN AI Agents
👀 See Also

Claude Code's Tool API Details Revealed
A Reddit user extracted details about Claude Code's tool API, including file system operations, bash execution, web search, and how tool calls are structured using XML-like blocks.

Awesome OpenClaw Skills Repository Provides 5,400+ Filtered Skills
A GitHub repository called awesome-openclaw-skills offers 1,715+ production-ready skills that AI agents can install with one CLI command, filtered from the official OpenClaw Skills Registry.

JobPilot: Claude Code Plugin for Automated Job Applications
JobPilot is a Claude Code plugin that automates job searching and application processes using Playwright browser automation. It includes commands for searching job boards, auto-filling applications, generating cover letters, and tracking application statistics.

AGENTS.md Schema for LLM-Compiled Knowledge Bases with Learning Layer
AGENTS.md v1.0 provides a schema standard for Claude to build and maintain personal research wikis from raw sources, including a spaced repetition learning layer with automatic flashcard generation and knowledge gap tracking.