UI and Server for Anthropic's Natural Language Autoencoders on llama.cpp

✍️ OpenClawRadar📅 Published: May 13, 2026🔗 Source
Ad

Anthropic's first open-weight models, Natural Language Autoencoders (NLAs), are finetunes of popular open-weight architectures. Because they don't modify the underlying model architecture or modeling code, inference with llama.cpp is straightforward. A developer has packaged all NLA features—activation extraction, activation explanation, activation reconstruction, and explanation-edit steering—into a custom llama.cpp server, paired with a Mikupad UI for token-level activation explanation and steering.

Key Features

  • Activation extraction: Extract internal activations from any layer of the base model.
  • Activation explanation: Get human-readable explanations for extracted activations.
  • Activation reconstruction: Reconstruct activations from their explanations.
  • Explanation-edit steering: Modify explanations and steer the model's output accordingly.
Ad

Technical Details

The server is built on top of llama.cpp and requires three models to be loaded simultaneously: the base model, the actor model, and the critic model. This is a memory-intensive setup. The developer is working on a LoRA-based version that would allow loading a single model into memory, reducing the footprint significantly.

The Mikupad UI provides a token-level interface for activation explanation and steering. You can inspect which tokens activate certain features and adjust the model's behavior by editing explanations in real time.

Getting Started

Source code and setup instructions are available on Reddit. Currently, you must have the three NLA model checkpoints (base, actor, critic) and compile the custom llama.cpp server. The LoRA version is forthcoming.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also