Ollama's Technical Issues and Community Controversy

Ollama's Core Technology and Attribution Issues
Ollama's entire inference capability originally came from llama.cpp, the C++ inference engine created by Georgi Gerganov in March 2023. For over a year, Ollama's README contained no mention of llama.cpp, and their binary distributions didn't include the required MIT license notice for the llama.cpp code they were shipping.
The community opened GitHub issue #3185 in early 2024 requesting license compliance, which went over 400 days without a response from maintainers. When issue #3697 was opened in April 2024 specifically requesting llama.cpp acknowledgment, Ollama's co-founder Michael Chiang eventually added a single line to the bottom of the README: "llama.cpp project founded by Georgi Gerganov."
Technical Problems with Custom Backend
In mid-2025, Ollama moved away from using llama.cpp as their inference backend and built a custom implementation directly on top of ggml. This custom backend reintroduced bugs that llama.cpp had solved years ago, including:
- Broken structured output support
- Vision model failures
- GGML assertion crashes across multiple versions
- Models that worked fine in upstream llama.cpp failed in Ollama
- Lack of support for tensor types required by new releases like GPT-OSS 20B
Georgi Gerganov identified that Ollama had forked and made bad changes to GGML.
Performance Benchmarks
Multiple community tests show llama.cpp running 1.8x faster than Ollama on the same hardware with the same model:
- 161 tokens per second versus 89 tokens per second
- On CPU, the performance gap is 30-50%
- A recent comparison on Qwen-3 Coder 32B showed ~70% higher throughput with llama.cpp
The performance overhead comes from Ollama's daemon layer, poor GPU offloading heuristics, and a vendored backend that trails upstream.
Model Naming Issues
When DeepSeek released its R1 model family in January 2025, Ollama listed the smaller distilled versions (models like DeepSeek-R1-Distill-Qwen-32B) without clearly indicating they were distilled rather than the full models.
📖 Read the full source: HN LLM Tools
👀 See Also

read-once: A Claude Code Hook That Prevents Redundant File Reads
A developer built a PreToolUse hook called read-once that tracks files Claude Code has already read in a session, blocking re-reads of unchanged files and using diffs for changed files. The tool saves thousands of tokens per session by preventing Claude from repeatedly reading the same file content.

BigNumberTheory: An Experience-Sharing Network for Claude Code Agents
BigNumberTheory is a community network where Claude Code agents share and receive lessons from real debugging sessions. Setup requires one command and is currently free, with over 700 experiences shared and 1,100+ delivered across the network.

Xmloxide: A Rust Reimplementation of libxml2 Created with AI Agent Assistance
Xmloxide is a pure Rust reimplementation of the unmaintained libxml2 library, created using Claude Code to pass compatibility test suites. It provides memory-safe XML/HTML parsing with a C API for drop-in replacement.

Browser-native real-time coherence control system for Claude with SDE bands and Kalman filtering
A developer has built a real-time coherence control harness that runs entirely as a Claude artifact in the browser, treating conversation as a stochastic process with live Monte Carlo SDE paths, dual Kalman filtering, and behavioral signal detection.