Claude Mass Distilled by Chinese AI Firms via 24K Fake Accounts

Massive Scale Distillation Operation

Anthropic's report documents systematic distillation efforts by three Chinese AI companies: DeepSeek, Moonshot AI, and MiniMax. The operation involved creating approximately 24,000 fake accounts and conducting over 16 million exchanges with Claude through proxy networks that ran up to 20,000 accounts simultaneously.

Specific Distillation Methods

DeepSeek had Claude explain its own reasoning step by step, then used those explanations as training data. They also prompted Claude to answer politically sensitive questions about Chinese dissidents to build censorship-navigation data. MiniMax ran more than 13 million exchanges and pivoted to a new Claude model within 24 hours of its release.

Safety Implications for Users

The report states directly that distilled models are unlikely to retain the original safety mechanisms. While routine questions yield similar answers between original and copied models, edge cases involving medical, legal, or nuanced topics reveal critical differences. The copy models "barrel through with false confidence" because the training that taught caution was lost during distillation.

Anthropic compares this to having a doctor who only watched real doctors through a window for a year—routine cases might be handled adequately, but complicated cases offer no guarantees, and users can't distinguish between routine and complex cases until it's too late.

Implications for Model Evaluation

The report notes a counterintuitive effect: disagreement between models becomes more valuable post-distillation. If two models that might share distilled capabilities still give different answers, at least one engaged in independent reasoning. Agreement between models becomes less meaningful, while disagreement indicates genuine independent processing.

📖 Read the full source: r/ClaudeAI