Replace 3GB AI Pipeline with Git Commands

From complex pipeline to simple git commands

A developer building DiffMem, a git-backed memory system for AI agents, discovered their retrieval layer was unnecessarily complex. They had sentence-transformers doing cosine similarity scoring, rank-bm25 for keyword search, a two-pass LLM pipeline to distill queries and synthesize results, and scikit-learn and numpy as dependencies. This resulted in a 3GB Docker image (sentence-transformers dragging in all of PyTorch), timeouts on heavy users around 10% of the time, and a cold start that rebuilt an in-memory BM25 index every time.

The realization: LLMs already know git

The key insight came from recognizing that Unix commands are the densest tool-use pattern in any LLM's training data. Billions of README files, CI scripts, and Stack Overflow answers are full of grep, git log, and cat commands. The LLM doesn't need a custom retrieval pipeline built around it—it already speaks the language of shell commands.

The single-tool solution

They replaced the entire complex system with one tool:

{
  "name": "run",
  "description": "Execute a read-only command in the memory repository",
  "parameters": {
    "command": "Shell command (supports |, &&, ||, ; chaining)"
  }
}

That's it. One function. The agent writes shell commands, already knowing grep, git diff, head, and other Unix utilities without needing to be taught.

How the agent works

The agent follows a fixed protocol across its turns:

Read the entity manifest
Run a temporal probe against the commit log
Batch its investigation into a single tool call
Output a retrieval plan
Stop

The agent returns pointers, not content. It reads lightweight signals during its turns (head -30 for structure, grep -n for keywords, git diff HEAD~3.. for recent changes), then tells code what to fetch. Code resolves the pointers, keeping the agent's context lean.

Real-world example

When a user sent a birthday message about feeling isolated with work pressure, the agent ran:

git log --format='%h %ad' --date=relative --name-only -15

This revealed that wife.md and company.md changed in the same session, and a key colleague showed up in 2 of the last 3 sessions. The user's message said nothing about work—BM25 would never have found company.md, and semantic similarity on "feeling isolated on my birthday" wouldn't get there either. But the co-occurrence in the commit history revealed the connection that actually mattered.

In turn 3, the agent composed one tool call with nine commands chained with semicolons:

git diff HEAD~2.. -- memories/people/wife.md; git log --stat -5 -- memories/people/wife.md; head -30 memories/people/wife.md; grep -n "birthday|surgery|stress" memories/people/wife.md; tail -50 timeline/2026-03.md; git diff HEAD~3.. -- timeline/2026-03.md; grep -n "project|deliverable" memories/contexts/company.md; git diff HEAD~2.. -- memories/contexts/company.md; git diff HEAD~1.. -- memories/people/colleague.md

The final output was a JSON retrieval plan with specific git diffs, priority levels, and token estimates—not content, but pointers. Code then ran the commands and assembled context against the token budget.

Results

This approach allowed them to delete rank-bm25, sentence-transformers, scikit-learn, and numpy. The Docker image dropped by approximately 3GB. Server starts faster, uses a fraction of the memory, and has no more BM25 index on cold start. The 10% timeout rate disappeared. On Cloud Run with real user load, this wasn't a marginal improvement but a different class of deployment.

What's left: requests, openai, gitpython.

📖 Read the full source: r/LocalLLaMA