UI and Server for Anthropic's Natural Language Autoencoders on llama.cpp
Anthropic's first open-weight models, Natural Language Autoencoders (NLAs), are finetunes of popular open-weight architectures. Because they don't modify the underlying model architecture or modeling code, inference with llama.cpp is straightforward. A developer has packaged all NLA features—activation extraction, activation explanation, activation reconstruction, and explanation-edit steering—into a custom llama.cpp server, paired with a Mikupad UI for token-level activation explanation and steering.
Key Features
- Activation extraction: Extract internal activations from any layer of the base model.
- Activation explanation: Get human-readable explanations for extracted activations.
- Activation reconstruction: Reconstruct activations from their explanations.
- Explanation-edit steering: Modify explanations and steer the model's output accordingly.
Technical Details
The server is built on top of llama.cpp and requires three models to be loaded simultaneously: the base model, the actor model, and the critic model. This is a memory-intensive setup. The developer is working on a LoRA-based version that would allow loading a single model into memory, reducing the footprint significantly.
The Mikupad UI provides a token-level interface for activation explanation and steering. You can inspect which tokens activate certain features and adjust the model's behavior by editing explanations in real time.
Getting Started
Source code and setup instructions are available on Reddit. Currently, you must have the three NLA model checkpoints (base, actor, critic) and compile the custom llama.cpp server. The LoRA version is forthcoming.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Customizing Claude AI for Improved Feedback
Adjust Claude AI's settings to avoid excessive agreement and push for more critical thinking and practical feedback.

Claude's 171 Internal Emotion Vectors Influence Output: Toolkit Based on Anthropic Research
Anthropic's research paper reveals Claude has 171 internal activation patterns that function like emotion vectors, causally driving its behavior before it writes. A developer created a toolkit with 7 practical prompting principles and system prompts based on these findings.

Claude Code Lazy-Loads Tool Schemas via ToolSearch to Save Tokens
Claude Code defers tool schema loading by sending only tool names upfront and requiring a ToolSearch call to fetch schemas before use. This architecture cuts token burn significantly.

Flash-MOE Benchmark on M5 Max: 12.99 tok/s with Qwen3.5-397B
A benchmark of the 397-billion-parameter Qwen3.5 model running locally on a MacBook Pro M5 Max with 128GB RAM achieved 12.99 tokens per second using 4-bit quantization and cache-io-split 4, three times faster than the original 48GB benchmark.