Testing δ-Mem on Apple Silicon: MLX Implementation and Benchmarks

✍️ OpenClawRadar📅 Published: May 16, 2026🔗 Source
Testing δ-Mem on Apple Silicon: MLX Implementation and Benchmarks
Ad

A Reddit user implemented the δ-mem research paper (arXiv 2605.12357) for Apple Silicon using mlx and OpenClaw integration. The paper improves model attention direction without context or LoRA, reporting 20% better answers in their tests. The implementation used Qwen3-4B-Instruct via mlx and custom adapters.

Benchmark Results (normalized mlx tests, Qwen3-4B-Instruct on MacMini 64GB):

  • Synthetic paper-style: Plain 0.5129, δ-mem 0.5129 (1.00x)
  • LoCoMo-10 mini: Plain 0.0500, δ-mem 0.1833 (3.67x)
  • OpenClaw replay: Plain 0.5701, δ-mem 0.6667 (1.17x)

Latency costs (vs plain):

  • Synthetic: 1.013x
  • LoCoMo-10 mini: 1.33x query / 1.50x total
  • OpenClaw replay: 1.30x

Key links:

Takeaways:

Ad
  • Synthetic probes were flat (1.00x), but LoCoMo-mini showed strong relative gains (3.67x).
  • OpenClaw-style replay showed a practically meaningful improvement (6/8 → 7/8 probes passed, 1.17x).
  • The user notes Apple Silicon cannot run CUDA efficiently, so results are lower than paper benchmarks. Paper benchmarks (Qwen3-4B-Instruct) showed avg 1.10x vs frozen backbone, MemoryAgentBench 1.31x, LoCoMo 1.20x.
  • The user is seeking help (or funding ~$6k) to train an adapter for larger models like Qwen3.6:27B.

Who it's for: Developers running local LLM agents on Apple Silicon who want to experiment with δ-mem weight modulation to improve memory/context performance.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

SourceBridge: Open-source tool for codebase analysis using local LLMs
Tools

SourceBridge: Open-source tool for codebase analysis using local LLMs

SourceBridge is an open-source tool that indexes Git repositories into symbol graphs and uses local LLMs to generate codebase summaries, architecture walkthroughs, and learning materials. It supports multiple local backends including Ollama, llama.cpp, vLLM, LM Studio, and SGLang via OpenAI-compatible APIs.

OpenClawRadar
Inline Visualizer: Local AI Models Can Now Render Interactive HTML Visualizations
Tools

Inline Visualizer: Local AI Models Can Now Render Interactive HTML Visualizations

Inline Visualizer is a BSD-3 licensed plugin for Open WebUI that enables any local AI model with tool calling support to render interactive HTML/SVG visualizations directly in chat, with a JavaScript bridge allowing elements to send messages back to the AI.

OpenClawRadar
Claude AI Session Compaction Issues and Workarounds
Tools

Claude AI Session Compaction Issues and Workarounds

Default compaction in Claude AI sessions can degrade retrieval accuracy from ~9.75/10 to ~5/10, causing hallucinations. The user tested with 418K tokens and found manual compaction using Opus maintains accuracy while default compaction fails.

OpenClawRadar
Unsloth Studio enables 2x training speed with 70% VRAM reduction for local AI fine-tuning
Tools

Unsloth Studio enables 2x training speed with 70% VRAM reduction for local AI fine-tuning

Unsloth Studio provides tools to train and fine-tune language models on local hardware with 2x faster training and 70% VRAM reduction. It supports exporting models to GGUF format for use with Ollama and enables full local AI coding workflows on 24GB hardware like RTX 4090.

OpenClawRadar