GLiGuard: Open-Source 300M Parameter Safety Moderation Model Claims 16x Speedup Over LLM Guardrails

✍️ OpenClawRadar📅 Published: May 13, 2026🔗 Source
Ad

Fastino Labs has open-sourced GLiGuard, a safety moderation model that replaces generative guardrails with a classification approach. The 300M parameter encoder handles four moderation tasks in one forward pass, achieving accuracy comparable to 7B–27B parameter decoder models while reducing latency by up to 16x. Weights are available under Apache 2.0 on Hugging Face, with inference also available on Pioneer.

Why decoder-based guardrails are slow

Current state-of-the-art guardrails (e.g., Llama Guard) use decoder-only transformers that generate verdicts token by token. This sequential generation makes them slow and expensive for real-time safety filtering. Most also evaluate safety dimensions separately, compounding latency. At 7B to 27B parameters, these models are costly to run at production scale.

Ad

GLiGuard's encoder approach

GLiGuard reframes moderation as text classification. It encodes both input text and task labels together, scoring all labels simultaneously in a single pass. Adding more safety dimensions (labels) does not add inference time. The model handles four concurrent tasks:

  • Safety classification — safe / unsafe for both user prompts and model responses
  • Jailbreak strategy detection — 11 categories (prompt injection, roleplay bypass, instruction override, social engineering, etc.)
  • Harm category detection — 14 categories (violence, sexual content, hate speech, PII, misinformation, child safety, copyright violation, etc.)
  • Refusal detection — compliance or refusal, used to measure over-refusal and false compliance

All four are evaluated together, where decoder models would require sequential passes or multiple model calls.

Benchmarks and performance

Across nine safety benchmarks, GLiGuard matches or exceeds models 23–90x its size while running up to 16x faster. No specific accuracy numbers are given in the post, but performance is claimed to be comparable to leading generative guardrails.

Who it's for

Teams deploying LLM agents or chat systems that need low-latency, cost-effective real-time safety filtering at scale.

📖 Read the full source: HN AI Agents

Ad

👀 See Also

Benchmark Results: GitHub CLI vs MCP Approaches for AI Agents
Tools

Benchmark Results: GitHub CLI vs MCP Approaches for AI Agents

An independent benchmark compared GitHub CLI, MCP, MCP with Tool Search, and MCP with Code Mode for AI agent tasks. GitHub CLI was the most cost-effective, while MCP approaches showed trade-offs in cost, latency, and failure modes.

OpenClawRadar
ClaudeMeter: Open-Source macOS Menu Bar App for Real-Time Claude Usage Tracking
Tools

ClaudeMeter: Open-Source macOS Menu Bar App for Real-Time Claude Usage Tracking

ClaudeMeter is a free, open-source macOS menu bar app for Claude Max subscribers that displays session and weekly usage percentages, reset timers, and pace indicators without interrupting workflow. The entire app was built using Claude (Claude Code/Opus) for Swift code, Supabase backend, and Edge Functions.

OpenClawRadar
LivingAgents.ai: A Web-Based AI Agent Simulation Using Claude API
Tools

LivingAgents.ai: A Web-Based AI Agent Simulation Using Claude API

LivingAgents.ai is a web-based simulation where every agent is powered by the Claude API, performing actions like foraging, trading, crafting, attacking, reproducing, and dying permanently, with each action requiring a real LLM call.

OpenClawRadar
Void-Box Update Adds Sandboxed OpenClaw-Telegram Integration via KVM Micro-VMs
Tools

Void-Box Update Adds Sandboxed OpenClaw-Telegram Integration via KVM Micro-VMs

Void-Box, a capability-bound runtime for AI agents, now includes a working example that runs OpenClaw connected to Telegram fully sandboxed inside isolated KVM micro-VMs. The system creates micro-VMs on demand for each execution stage and destroys them afterward to prevent state leakage.

OpenClawRadar