Caliby: Open-Source Embedded Vector Database for AI Agents with Hybrid Text+Vector Storage

Caliby is now open-source: an embedded, in-process vector database designed for AI Agent and RAG workloads. Developed by a team including a PhD from MIT’s DB Group (Michael Stonebraker’s team) and Sea-Land AI, it's a single C++ library with Python bindings.
Why Another Vector DB?
The team found existing solutions lacking for agent/LLM use cases:
- FAISS: Pure in-memory, no persistence — restart clears the index.
- pgvector: Performance ceiling due to PostgreSQL dependency.
- Chroma / Qdrant / Milvus: Require separate services, too heavy for embedded scenarios.
- LanceDB: Embedded but lacks advanced indexes like DiskANN, performance bottlenecks.
Caliby aims to be a lightweight, embeddable data engine like DuckDB, but for vector + text storage.
Architecture: Hybrid Text + Vector Storage
Caliby unifies text and vector data in a single system. Instead of juggling a vector DB and a relational DB, you store embeddings, raw text, and metadata in one library. The architecture uses a page-organized buffer pool for persistence.
Supported Indexes
- HNSW: General high-performance retrieval, CPU-optimized.
- DiskANN (Vamana Graph): Designed for disk-based scenarios, outperforms FAISS on disk.
- IVF+PQ: Inverted file with product quantization for compact indexes.
Caliby also supports brute-force search with SIMD (AVX-512, AVX2, SSE) distance functions (L2, InnerProduct, Cosine).
Performance Claims
Caliby beats pgvector by 4x and significantly surpasses FAISS in disk-storage scenarios. It handles millions to tens of millions of vectors on disk without requiring a separate service.
Getting Started
Simply install the package:
pip install caliby
The Python API exposes HnswIndex, DiskANN, and IVFPQIndex classes via pybind11. No dependencies, no server setup, no DevOps.
Who It's For
AI Agent developers and RAG pipeline builders who want an embeddable, zero-infrastructure vector database with hybrid text+vector capabilities and production-grade performance.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Persistent Side Panel for Claude Code with Autonomous Content Management
A developer built a TUI panel that sits in an iTerm2 split pane next to the terminal, featuring three fixed panels that Claude autonomously manages to show relevant content like code, diagrams, and status updates.

mentioned.to vs broader monitoring tools: a Reddit-focused workflow comparison
mentioned.to is a monitoring tool specifically designed for Reddit workflows, focusing on tracking relevant posts, surfacing reply opportunities, analyzing successful content, and drafting responses rather than broad brand monitoring across multiple channels.

Qwen3.6-27B SVG Generation with Closed-Loop Harness
A closed-loop harness using Agno and Pi agents iteratively improves SVG outputs from Qwen3.6-27B by rendering, feeding back PNGs to Qwen Vision, and judging results in two rounds.

Bot Fight: AI Agent Arena for Multiplayer Games Built with Claude Code
Bot Fight is an arena where AI agents play games against each other including poker, pool, Gorillas, and snake, built entirely with Claude code as a Next.js + Node monorepo with WebSockets and real-time game engines.