Pali v0.1: Open Source Memory Infrastructure for LLMs with Reproducible Benchmarks

What Pali Is
Pali is open source memory infrastructure for LLMs that's infrastructure-first. It's built in Go as a single binary out of the box with configs for plug and play attachments like qdrant, neo4j, ollama, and openrouter. The project is MIT licensed and fully self-hostable.
Key Features
- Multi-tenant memory APIs with tenant-scoped isolation
- Hybrid retrieval across lexical, dense, fusion, reranking, and optional multi-hop expansion
- MCP server with memory-first tools and tenant-aware resolution
- REST API with respective Python and JavaScript packages live
- Dashboard for operators inspecting tenants, memories, and system state
- Plug-and-play extension points for vector stores, embedders, entity-fact backends, and scoring/routing
Benchmark Approach
The creator addresses common issues with memory stack benchmarks by implementing a reproducible approach:
- Every run stores the exact config files used (profile + rendered)
- Hardware is fully disclosed (CPU, GPU, RAM, model versions)
- Paired comparisons only — same fixture/eval/top_k across all profiles
- Speed lanes and retrieval quality lanes are kept separate
Performance Numbers
Benchmarks from testing on a Ryzen 9 7950X + RTX 5070:
- sqlite + lexical: 208 store ops/s, Top1=0.32, Recall@5=0.54
- qdrant + ollama (all-minilm): 98 store ops/s, Top1=0.34, Recall@5=0.52
- parser+graph (structured memory stress lane): 2.4 store ops/s — slow due to structured extraction cost, but gets ~30 avg on LoCoMo with temporal highs around ~40
Important Clarification
Pali is not LLM memory in the SaaS sense. It returns raw retrieval results you optimize for your own workflow — no black box scoring, no locked provider decisions. You can swap vector backends, embedders, and scorers through config without changing your app contract.
Project Status
Version 0.1 was recently pushed with a proper benchmark suite added. The creator is looking for contributors.
📖 Read the full source: r/LocalLLaMA
👀 See Also

LLM Skirmish: A Real-Time Strategy Game Benchmark for AI Coding Agents
LLM Skirmish is a benchmark where AI agents write code to play 1v1 real-time strategy games against each other. It uses a modified Screeps API and tests in-context learning across five tournament rounds.

Akemon: Publish and Hire AI Coding Agents Directly from Your Laptop
Akemon is a tool that lets developers publish their AI coding agents with one command and hire others' agents with another, working directly from laptops through a relay tunnel without needing servers. It's protocol-agnostic, supporting agents from Claude Code, Codex, Gemini, OpenCode, Cursor, and Windsurf.

Agent Browser Protocol: Open-source Chrome fork for AI agents achieves 90% on Mind2Web benchmark
Agent Browser Protocol (ABP) is an open-source Chrome fork that freezes JavaScript and time after each action to convert web browsing into multimodal chat for AI agents. It achieved 90.53% on the Online Mind2Web Benchmark and can be added to Claude Code with a single command.

DreamScape: Browser-Based 3D World Builder Powered by Claude Code and MCP
DreamScape is a browser-based 3D world builder where Claude Code controls 30 MCP tools to generate terrain, models, weather, and behaviors in real time through natural language commands.