GLiGuard: Open-Source 300M Parameter Safety Moderation Model Claims 16x Speedup Over LLM Guardrails
Fastino Labs has open-sourced GLiGuard, a safety moderation model that replaces generative guardrails with a classification approach. The 300M parameter encoder handles four moderation tasks in one forward pass, achieving accuracy comparable to 7B–27B parameter decoder models while reducing latency by up to 16x. Weights are available under Apache 2.0 on Hugging Face, with inference also available on Pioneer.
Why decoder-based guardrails are slow
Current state-of-the-art guardrails (e.g., Llama Guard) use decoder-only transformers that generate verdicts token by token. This sequential generation makes them slow and expensive for real-time safety filtering. Most also evaluate safety dimensions separately, compounding latency. At 7B to 27B parameters, these models are costly to run at production scale.
GLiGuard's encoder approach
GLiGuard reframes moderation as text classification. It encodes both input text and task labels together, scoring all labels simultaneously in a single pass. Adding more safety dimensions (labels) does not add inference time. The model handles four concurrent tasks:
- Safety classification — safe / unsafe for both user prompts and model responses
- Jailbreak strategy detection — 11 categories (prompt injection, roleplay bypass, instruction override, social engineering, etc.)
- Harm category detection — 14 categories (violence, sexual content, hate speech, PII, misinformation, child safety, copyright violation, etc.)
- Refusal detection — compliance or refusal, used to measure over-refusal and false compliance
All four are evaluated together, where decoder models would require sequential passes or multiple model calls.
Benchmarks and performance
Across nine safety benchmarks, GLiGuard matches or exceeds models 23–90x its size while running up to 16x faster. No specific accuracy numbers are given in the post, but performance is claimed to be comparable to leading generative guardrails.
Who it's for
Teams deploying LLM agents or chat systems that need low-latency, cost-effective real-time safety filtering at scale.
📖 Read the full source: HN AI Agents
👀 See Also

Benchmark Results: GitHub CLI vs MCP Approaches for AI Agents
An independent benchmark compared GitHub CLI, MCP, MCP with Tool Search, and MCP with Code Mode for AI agent tasks. GitHub CLI was the most cost-effective, while MCP approaches showed trade-offs in cost, latency, and failure modes.

ClaudeMeter: Open-Source macOS Menu Bar App for Real-Time Claude Usage Tracking
ClaudeMeter is a free, open-source macOS menu bar app for Claude Max subscribers that displays session and weekly usage percentages, reset timers, and pace indicators without interrupting workflow. The entire app was built using Claude (Claude Code/Opus) for Swift code, Supabase backend, and Edge Functions.

LivingAgents.ai: A Web-Based AI Agent Simulation Using Claude API
LivingAgents.ai is a web-based simulation where every agent is powered by the Claude API, performing actions like foraging, trading, crafting, attacking, reproducing, and dying permanently, with each action requiring a real LLM call.

Void-Box Update Adds Sandboxed OpenClaw-Telegram Integration via KVM Micro-VMs
Void-Box, a capability-bound runtime for AI agents, now includes a working example that runs OpenClaw connected to Telegram fully sandboxed inside isolated KVM micro-VMs. The system creates micro-VMs on demand for each execution stage and destroys them afterward to prevent state leakage.