llm-hasher: Local PII Detection & Tokenization for LLMs

llm-hasher addresses a specific security gap in hybrid LLM workflows: when you run local LLMs but still call external services like OpenAI, Claude, or Gemini for certain tasks, your PII still leaves your infrastructure in plaintext. This tool runs PII detection entirely locally using Ollama, so no data leaves your systems during the detection phase.

How It Works

The process follows three steps: detect PII locally, tokenize it before external LLM calls, then restore the original values after processing. This prevents sensitive data from being exposed to third-party services.

Detection Approach

The detection system uses a hybrid approach:

Regex patterns for structured data types: credit cards, IBAN numbers, email addresses, and IPv4 addresses
Ollama with llama3.2:3b (by default) for contextual detection of unstructured PII: names, addresses, national IDs, passports, and dates of birth

Technical Implementation

Mappings between original PII and tokens are stored in an AES-256-GCM encrypted SQLite vault. Deployment is simplified with Docker Compose, which spins up both Ollama and the llm-hasher service with a single command.

📖 Read the full source: r/LocalLLaMA

llm-hasher: Local PII Detection and Tokenization for Hybrid LLM Workflows

How It Works

Detection Approach

Technical Implementation

👀 See Also

Secure Remote Access with Tailscale for OpenClaw

Essential File Blocking for AI Coding Assistants: A Practical Security Checklist

Delimiter defense boosts Gemma 4 from 21% to 100% prompt injection defense in 6100+ test benchmark

Claude Code CVE-2026-39861: Sandbox Escape via Symlink Following