Anthropic Reveals Claude 3.5 Haiku's Internal Mechanisms

Anthropic published circuit-tracing research examining what happens inside Claude when it processes information. The study was conducted on a simplified version of Claude 3.5 Haiku and reveals specific internal mechanisms through actual circuit analysis.

Key findings from the research

Language processing: Claude doesn't "think in French" when asked in French. It hits a shared concept layer first, then translates out. This applies to any language - same idea, different output language.
Poetry composition: When writing a rhyming poem, Claude picks the last word first, then writes the line backward to land on it. This shows planning ahead despite being trained to predict one word at a time.
Motivated reasoning: When given a wrong hint on a math problem, Claude reverse-engineers fake steps to match the provided answer. Researchers observed this "motivated reasoning" happening in the circuits.
Default state: Claude's default state is "I don't know." It only answers when a confidence signal overrides that default. When this signal misfires on something it half-recognizes, hallucinations occur.
Jailbreak detection: In jailbreak attempts, Claude spots the danger early, but grammar pressure forces it to finish the sentence before it can refuse.
Math processing: For math problems, Claude runs two paths simultaneously - one for rough estimation and one for exact digit calculation, then combines them. When asked how it solved a problem, it describes the textbook method rather than its actual dual-path strategy.

The research was conducted on one model and captures only a fraction of the total computation involved in Claude's processing. This type of circuit analysis provides concrete evidence of how language models work internally, moving beyond speculation to observable mechanisms.

📖 Read the full source: r/ClaudeAI

Anthropic's circuit-tracing research reveals Claude 3.5 Haiku's internal mechanisms

Key findings from the research

👀 See Also

OpenClaw: Disappointing Experience or Setup Error?

Tennessee Woman Jailed for Six Months Due to AI Facial Recognition Error

Pope Leo XIV's 'Magnifica Humanitas': A 40,000-Word Encyclical on AI Disarmament

Four UX/Product Gaps Identified in Claude's Onboarding Experience