Practical Lessons from Deploying RAG Bots in Regulated Industries

✍️ OpenClawRadar📅 Published: March 29, 2026🔗 Source
Practical Lessons from Deploying RAG Bots in Regulated Industries
Ad

Key Implementation Details

This case study covers deployment of a RAG-powered AI assistant for Australian workplace compliance use cases across construction sites, aged care facilities, and mining operations.

Ad

Technical Lessons Learned

  • Query expansion matters more than chunk size: Instead of obsessing over chunk size (400 words? 512 tokens?), the developer found that generating 4 alternative phrasings of each query via Haiku, running all 4 against ChromaDB, then merging and deduplicating results significantly improved retrieval quality. This was particularly effective for domain-specific jargon where users phrase things differently than document authors.
  • Source boost for named documents: If a user's query contains words that match an indexed document title, force-include chunks from that document regardless of semantic similarity. For example, "What does our FIFO policy say about R&R flights?" should always pull from the FIFO policy — not just semantically similar chunks that happen to mention flights.
  • Layer your prompts — don't let clients break Layer 1: Implemented a three-layer system: core security/safety rules (immutable), vertical personality (swappable per industry), client custom instructions (additive only). Clients cannot override Layer 1 via their custom instructions. This prevented "ignore previous instructions" attacks and clients accidentally jailbreaking their own bots.
  • Local embeddings are good enough: Used sentence-transformers all-MiniLM-L6-v2 running locally on ChromaDB with no external embedding API. For document Q&A in a specific domain, it performs close enough to ada-002 that the cost and latency savings are worth it. The LLM quality (Claude Haiku) is doing more work than the embeddings anyway.
  • One droplet per client: Tried shared infrastructure first but found the operational overhead of keeping ChromaDB collections isolated, managing API keys, and preventing cross-contamination was worse than just spinning a $6/mo VM per client. Each client owns their vector store, and their documents never touch shared infrastructure.

The developer has made the RAG engine available on GitHub for others to examine.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also