LLMock: HTTP-based mocking server for deterministic LLM testing across processes

LLMock is a mocking server that intercepts LLM API calls by running as a real HTTP server on a specified port, allowing deterministic testing across multiple processes without hitting paid APIs.
Key Details
The tool was discovered after a developer spent $12 running Playwright tests against real OpenAI APIs. The problem occurred when using MSW (Mock Service Worker), which patches the HTTP module inside the Node.js process that calls server.listen(), but leaves separate processes (like a Python agent) completely blind to the mocking.
With LLMock, you point the OPENAI_BASE_URL environment variable at the mock server from every process, regardless of whether it's Node.js, Python, or any other language:
const mock = new LLMock({ port: 5555 });
await mock.start();
process.env.OPENAI_BASE_URL = "http://localhost:5555/v1";Fixtures are plain JSON files that match on user message substrings or regex patterns, eliminating handler boilerplate:
{
"fixtures": [
{
"match": { "userMessage": "stock price of AAPL" },
"response": { "content": "The current stock price of Apple Inc. (AAPL) is $150.25." }
}
]
}Key features from the source:
- Speaks actual OpenAI/Claude/Gemini SSE format correctly (getting event types wrong breaks streaming in subtle ways)
- Full tool call support - agent frameworks execute them normally
- Predicate routing for inspecting system prompt state or message history for multi-agent flows
- Request journal to assert on what was actually called, not just whether the test passed
- Zero dependencies
The developer ended up with 9 LLM calls across 3 Playwright tests, costing $0 and producing deterministic results every run.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Manifest Adds MiniMax Token Plans with M2.7 Model Support
Manifest, an open source routing layer for OpenClaw, now supports MiniMax token plans starting at $10/month. The new MiniMax M2.7 model is specifically built for OpenClaw workflows and achieves 62.7 on MM-ClawBench and 56.2 on SWE-Bench Pro.

Codebase Memory MCP: Graph-based code exploration for Claude Code
A developer built an MCP server that indexes codebases into a persistent knowledge graph using Tree-sitter and SQLite, reducing token usage by 20x on average for structural queries like call tracing and dead code detection.

Ultimate Unreal Engine MCP: Claude Code Can Now Build and Verify Unreal Engine Levels with 132 Tools
Open-source MCP server exposes 132 tools across 26 domains, letting Claude spawn actors, set UPROPERTY values, take viewport screenshots, navigate cameras, and self-correct after mutations.

Tilde.run: An Agent Sandbox with a Transactional, Versioned Filesystem
Tilde.run provides isolated, reversible sandboxes for AI agents, with a versioned filesystem that mounts GitHub, S3, and Google Drive, and network isolation by default.