NLA Transforms Gemma 3’s Internal Activations into Readable Text for Any Token

Anthropic has published a new technique called Natural Language Autoencoders (NLA) that translates an LLM's internal activations into human-readable text for any specific token. They have released two model weight sets for Gemma 3 27b Instruct:
- Auto Verbalizer (AV): An LLM that translates the target model's activations into a natural language explanation of what the model is “thinking” when generating a particular token. Weights available at kitft/nla-gemma3-27b-L41-av.
- Activation Reconstructor (AR): A companion model that reconstructs activations from the AV’s text output, verifying the autoencoder is faithful. Weights at kitft/nla-gemma3-27b-L41-ar.
Neuronpedia already hosts an interactive demo at neuronpedia.org/gemma-3-27b-it/nla. You ask Gemma 3 a question, click any token in the response, then click “explain” to see the model’s internal reasoning for that token translated into plain text.
This is not about attention or saliency maps — it directly decodes the hidden state vectors. The AV model can run alongside your LLM and produce explanations per token, while the AR model ensures the AV output is a valid reconstruction. Both are released under open weights.
Who it's for: Researchers and engineers doing mechanistic interpretability work, or developers curious about why their agent’s model picks specific tokens.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Practical Findings from 11 Multi-Agent Software Builds Without Programmatic Scaffolding
Analysis of 11 autonomous multi-agent builds shows scope enforcement works mechanically (20/20 success) not via prompts (0/20), orchestration costs are dominated by memory re-ingestion (~95% of input spend), and worker model capability creates 9.8x throughput gaps.

Flotilla v0.5.0 Overhauls Background Execution to Beat Claude SDK Credit Caps
Flotilla v0.5.0 replaces sequential agent execution with non-blocking parallel loops, 30-minute per-agent timeouts, and local delegation to cut SDK credit usage.

MCP Search Server with Feedback-Driven Ranking for Claude Desktop
A community-built MCP search server for Claude Desktop runs Exa and Tavily search engines in parallel without requiring API keys. After using a result, users report whether it worked via an outcome tool, which feeds back into ranking to prioritize URLs that help agents succeed.

Quanta-SDK v0.9.2 adds MCP server for quantum circuit execution via AI agents
Quanta-SDK v0.9.2 now includes an MCP (Model Context Protocol) server that provides AI agents like Claude or GPT with tools to execute and interpret quantum circuits. The server offers over 20 tools including circuit execution on IBM hardware, result interpretation, noise analysis, and quantum financial pricing.