llmLibrarian: Local RAG Engine with MCP Integration for File-Based AI Search

What This Is
llmLibrarian is a local RAG (Retrieval-Augmented Generation) engine that exposes retrieval capabilities through the Model Context Protocol (MCP). It allows you to index folders into silos (ChromaDB collections), then query them from any MCP client—including Claude—to get grounded, cited answers.
Key Features and Architecture
The tool indexes folders into silos, which are ChromaDB collections. When you want direct answers instead of raw chunks, Ollama handles the synthesis layer. Everything runs locally on your machine.
The developer highlights the multi-silo capability as particularly powerful: combining silos allows patterns to surface across domains that would be difficult to catch manually. For example, a journal folder becomes a thinking partner that remembers what you've written, and a codebase becomes an agent that knows your actual files.
MCP Tools Exposed
retrieve— hybrid RRF vector search that returns raw chunks with confidence scores for Claude to reason overretrieve_bulk— multi-angle queries in one call, useful when aggregating across document typesask— Ollama-synthesized answer directly from retrieved context (defaults to llama3.1:8b, but you can swap in whatever model you have pulled)list_silos,inspect_silo,trigger_reindex— index management tools
Technical Stack
- ChromaDB for vector storage
- Ollama for model synthesis
- sentence-transformers (all-mpnet-base-v2, MPS-accelerated) for embeddings
- fastmcp for the MCP layer
The developer mentions that the multi-silo metadata tagging in ChromaDB took several iterations to get right and is open to discussing the architecture.
This type of tool is useful for developers who want to build AI agents that can reference and reason over their local files without sending data to external services.
📖 Read the full source: r/LocalLLaMA
👀 See Also

State of Local Deep Research Tools: GPT Researcher and Local Deep Research Lead, STORM and LangChain Projects Stagnant
A Reddit survey of local deep research projects as of May 2026 finds GPT Researcher and LearningCircuit's Local Deep Research most active; STORM and LangChain's Open Deep Research abandoned or semi-abandoned.

LAP: 1,500+ API Specs Compiled for LLM Consumption to Reduce Claude Hallucinations
LAP is a tool that compiles 1,500+ real API specifications into a lean format optimized for LLMs, providing verified endpoints and parameters to prevent AI coding agents like Claude from hallucinating incorrect API calls.

Top 6 Open Source Claude Skills (April 15 – May 3)
Six open-source Claude skills from the last 15 days: brand-alchemy, npm-downloads-to-leads, hyperframes, email-newsletter, pricing, and more. Detailed breakdown of each skill's functionality.

Otterly: Route OpenClaw Through Your Claude Code Subscription
Otterly is a small npm package that exposes the local Claude CLI as an OpenAI-compatible HTTP server, letting you bill OpenClaw requests to your Claude Code subscription instead of pay-per-token API rates.