PromptForest: Local-First Prompt Injection Detection with Uncertainty

PromptForest is a new local-first library created to tackle the issues commonly seen with current prompt injection detectors. It aims to detect prompt injections and jailbreaks efficiently and with a measure of uncertainty to avoid overconfidence in results. This approach differentiates it from traditional systems, particularly by maintaining performance while still providing more nuanced outputs.
Key Details
One of the fundamental issues with existing injection detectors is the reliance on large models like Llama 2 8B and Qualifire Sentinel 0.6B. These models are not only slow, but their overconfidence in results can lead to false positives that undermine their trustworthiness in production scenarios. Recognizing these limitations, PromptForest leverages a voting ensemble method comprising three smaller, specialized models:
- Llama Prompt Guard (86M): Offers the highest pre-ensemble Expected Calibration Error (ECE) in its weight class.
- Vijil Dome (ModernBERT): Delivers the highest accuracy per parameter.
- Custom XGBoost: Trained on embeddings for architectural diversity.
These models collectively use a weighted soft voting method to determine results, where more accurate models have greater influence. This method simplifies decision-making while maintaining high accuracy and consistency.
Benchmarking shows that PromptForest performs with a mean latency of ~141ms, compared to ~225ms for the Qualifire Sentinel v2, while delivering a comparable accuracy of 90% against their 97%. Calibration ECE also fares well at 0.070 versus Sentinel's 0.096. Throughput is impressive as well, with approximately 27 prompts processed per second on a consumer GPU using the pfranger CLI.
For testing and implementation, developers can experiment with PromptForest on Google Colab or audit prompts with the PFRanger tool, which works entirely locally. PFRanger utilizes parallelization to enhance speed and throughput.
📖 Read the full source: r/LocalLLaMA
👀 See Also

MCP Server Tracks Known Bugs in Dev Tools to Improve LLM Recommendations
nanmesh-mcp is an MCP server that crawls GitHub Issues, Stack Overflow, and Reddit to track real problems in 57 development tools, providing LLMs with current bug data before making library recommendations.

Claude's 171 Internal Emotion Vectors Influence Output: Toolkit Based on Anthropic Research
Anthropic's research paper reveals Claude has 171 internal activation patterns that function like emotion vectors, causally driving its behavior before it writes. A developer created a toolkit with 7 practical prompting principles and system prompts based on these findings.

Debugging Claude Code's Build-Check Logic: Why Name Search Fails and Structural Footprint Search Fixes It
Claude Code told a user 'feature not built' four times in one session — all wrong. The fix: replace name-based search with structural footprint search (routes, schemas, registered tools). Practical rule shared.

Developer builds local AI research agent that creates podcasts from topics or YouTube links
A developer built a fully local AI agent that takes topics or YouTube links and generates deep-dive reports, conversational podcast scripts, and audio. The system dynamically researches, extracts insights, refines summaries, and creates natural back-and-forth conversations.