Detecting and Preventing Distillation Attacks

URL: https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks
Source: Anthropic News
Date Summarized: 2026-02-23

tl;dr

Anthropic identified three AI labs (DeepSeek, Moonshot, MiniMax) running industrial-scale campaigns to extract Claude's capabilities through "distillation" — generating over 16 million exchanges via 24,000+ fraudulent accounts to train their own models on Claude's outputs.

What is Distillation?

Definition: Training a smaller/less capable model on outputs from a stronger one.

Legitimate Use: Frontier labs distill their own models to create smaller, cheaper versions for customers.

Illicit Use: Competitors extract powerful capabilities from other labs at fraction of the cost/time.

Why It Matters

National Security Risks

Illicitly distilled models lack safeguards
Protections against bioweapons, cyber attacks, etc. are stripped out
Dangerous capabilities proliferate without protections

Authoritarian Use

Foreign labs can feed distilled models into military/intelligence/surveillance
Enables offensive cyber operations, disinformation, mass surveillance
Open-sourced distilled models spread beyond any government's control

Export Control Implications

Distillation attacks undermine export controls
Allows foreign labs (including CCP-controlled) to close competitive gaps
Rapid "advancements" by these labs are actually extracted capabilities, not innovation
Restricted chip access limits both:
- Direct model training
- Scale of illicit distillation campaigns

What Anthropic Found

Detail	Data
Labs involved	DeepSeek, Moonshot, MiniMax
Exchange volume	16+ million interactions
Fraudulent accounts	~24,000 accounts
Violation	Terms of service + regional access restrictions

The Threat

Campaigns growing in intensity and sophistication
Window to act is narrow
Threat extends beyond any single company or region
Requires coordinated action by industry, policymakers, global AI community

Key Takeaways

Distillation is a dual-use technique — legitimate for efficiency, dangerous when weaponized
Scale matters — 16M+ exchanges shows industrial-level extraction, not casual use
Safeguards evaporate — distilled models lose critical safety protections
Export controls undermined — distillation bypasses chip restrictions through data theft
National security threat — authoritarian actors gain frontier AI capabilities

Source: https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks

3.0 KiB Raw Blame History