NLA Transforms Gemma 3’s Internal Activations into Readable Text for Any Token

✍️ OpenClawRadar📅 Published: May 8, 2026🔗 Source

Anthropic has published a new technique called Natural Language Autoencoders (NLA) that translates an LLM's internal activations into human-readable text for any specific token. They have released two model weight sets for Gemma 3 27b Instruct:

Auto Verbalizer (AV): An LLM that translates the target model's activations into a natural language explanation of what the model is “thinking” when generating a particular token. Weights available at kitft/nla-gemma3-27b-L41-av.
Activation Reconstructor (AR): A companion model that reconstructs activations from the AV’s text output, verifying the autoencoder is faithful. Weights at kitft/nla-gemma3-27b-L41-ar.

Neuronpedia already hosts an interactive demo at neuronpedia.org/gemma-3-27b-it/nla. You ask Gemma 3 a question, click any token in the response, then click “explain” to see the model’s internal reasoning for that token translated into plain text.

This is not about attention or saliency maps — it directly decodes the hidden state vectors. The AV model can run alongside your LLM and produce explanations per token, while the AR model ensures the AV output is a valid reconstruction. Both are released under open weights.

Who it's for: Researchers and engineers doing mechanistic interpretability work, or developers curious about why their agent’s model picks specific tokens.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Practical Findings from 11 Multi-Agent Software Builds Without Programmatic Scaffolding

Analysis of 11 autonomous multi-agent builds shows scope enforcement works mechanically (20/20 success) not via prompts (0/20), orchestration costs are dominated by memory re-ingestion (~95% of input spend), and worker model capability creates 9.8x throughput gaps.

Feb 28, 2026, 07:45 PM UTC

OpenClawRadar

Tools

Flotilla v0.5.0 Overhauls Background Execution to Beat Claude SDK Credit Caps

Flotilla v0.5.0 replaces sequential agent execution with non-blocking parallel loops, 30-minute per-agent timeouts, and local delegation to cut SDK credit usage.

Jun 14, 2026, 12:17 AM UTC

OpenClawRadar

Tools

MCP Search Server with Feedback-Driven Ranking for Claude Desktop

A community-built MCP search server for Claude Desktop runs Exa and Tavily search engines in parallel without requiring API keys. After using a result, users report whether it worked via an outcome tool, which feeds back into ranking to prioritize URLs that help agents succeed.

Feb 28, 2026, 03:45 AM UTC

OpenClawRadar

Tools

Quanta-SDK v0.9.2 adds MCP server for quantum circuit execution via AI agents

Quanta-SDK v0.9.2 now includes an MCP (Model Context Protocol) server that provides AI agents like Claude or GPT with tools to execute and interpret quantum circuits. The server offers over 20 tools including circuit execution on IBM hardware, result interpretation, noise analysis, and quantum financial pricing.

Apr 14, 2026, 01:45 PM UTC

OpenClawRadar