Practical Lessons from Deploying RAG Bots in Regulated Industries

Key Implementation Details
This case study covers deployment of a RAG-powered AI assistant for Australian workplace compliance use cases across construction sites, aged care facilities, and mining operations.
Technical Lessons Learned
- Query expansion matters more than chunk size: Instead of obsessing over chunk size (400 words? 512 tokens?), the developer found that generating 4 alternative phrasings of each query via Haiku, running all 4 against ChromaDB, then merging and deduplicating results significantly improved retrieval quality. This was particularly effective for domain-specific jargon where users phrase things differently than document authors.
- Source boost for named documents: If a user's query contains words that match an indexed document title, force-include chunks from that document regardless of semantic similarity. For example, "What does our FIFO policy say about R&R flights?" should always pull from the FIFO policy — not just semantically similar chunks that happen to mention flights.
- Layer your prompts — don't let clients break Layer 1: Implemented a three-layer system: core security/safety rules (immutable), vertical personality (swappable per industry), client custom instructions (additive only). Clients cannot override Layer 1 via their custom instructions. This prevented "ignore previous instructions" attacks and clients accidentally jailbreaking their own bots.
- Local embeddings are good enough: Used sentence-transformers all-MiniLM-L6-v2 running locally on ChromaDB with no external embedding API. For document Q&A in a specific domain, it performs close enough to ada-002 that the cost and latency savings are worth it. The LLM quality (Claude Haiku) is doing more work than the embeddings anyway.
- One droplet per client: Tried shared infrastructure first but found the operational overhead of keeping ChromaDB collections isolated, managing API keys, and preventing cross-contamination was worse than just spinning a $6/mo VM per client. Each client owns their vector store, and their documents never touch shared infrastructure.
The developer has made the RAG engine available on GitHub for others to examine.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Steam Game Development with Claude Code: Technical Review Process and Code Restructuring
A developer used Claude Code to build and publish a Steam game, detailing how it handled Steamworks SDK integration, depot configuration, and localization for 7 languages, but struggled with image specifications and hardcoded data structures.

Building a Steam Game in 10 Days Using Claude Code: Technical Challenges and Workflow
A developer built and released a game on Steam in 10 days using Claude Code without writing any code personally, but encountered significant challenges with logic design and debugging AI-generated code.

Building Non-Coding AI Agents with Claude Code: Three Practical Examples
A Reddit user shares their personal setup for creating AI agents using Claude Code, detailing three specific implementations: an automated morning briefing agent pulling from emails, todos, and calendar; a tmux-based pipeline for capturing Substack articles; and a meeting summarization agent.

Real Estate Developer's AI Agent Makes First Phone Call with Context and Voice Style
A developer running a multi-agent operation for real estate reports their AI agent made its first successful phone call, using full context about deals and prospects while mimicking the developer's specific sales approach and voice style.