Name: ClawOneClick
Author: ClawOneClick

Anthropic distillation attacks became the biggest AI controversy of February 2026 when Anthropic publicly accused three Chinese labs - DeepSeek, Moonshot (creators of Kimi), and MiniMax - of systematically extracting capabilities from Claude models to train their own. The report claims approximately 24,000 fraudulent accounts generated over 16 million exchanges targeting Claude's most valuable capabilities: agentic reasoning, tool use, and coding.

This is not the first time a US lab has accused competitors of using their outputs for training. OpenAI made similar claims about DeepSeek when R1 launched in early 2025. But Anthropic's report is the most detailed and specific accusation to date, naming individual labs and publishing exact numbers.

Key takeaway: Whether or not the claims hold up, this debate highlights a real tension in the AI industry - the line between legitimate model evaluation and illicit capability extraction is blurry, and every lab draws it differently.

What Is AI Model Distillation

Distillation in the AI context means extracting the core capabilities from a powerful model and using those outputs to train a smaller or cheaper model to behave similarly. The term comes from the idea of distilling the essential value out of something - separating the gold from the sand.

Here is how it works in practice:

Step	What Happens
1. Query generation	Send thousands of carefully crafted prompts to a frontier model
2. Output collection	Collect the model's responses, including reasoning traces
3. Dataset creation	Organize input-output pairs into training data
4. Model training	Use that data to fine-tune a smaller model to mimic the frontier model

Anthropic themselves acknowledge that distillation can be entirely legitimate. Labs routinely use it to create smaller, cheaper models for their own customers. The distinction Anthropic draws is between internal distillation (using your own model's outputs) and external distillation (using a competitor's model outputs to train yours).

Important: Distillation cannot produce a model smarter than the source. It can only bring a weaker model closer to the frontier model's level. Think of it as a ceiling, not a ladder.

Why Distillation Became Controversial

The distillation debate started when OpenAI accused DeepSeek of using O1 model outputs to train DeepSeek R1. O1 was OpenAI's first reasoning model - the first to "think" before answering by working through problems step-by-step. OpenAI was so concerned about competitors copying this that they hid O1's reasoning traces from users entirely.

DeepSeek R1, by contrast, was open-weight and showed its full reasoning chain. This transparency made R1 wildly popular among developers - but it did not stop OpenAI from claiming their models were used in its training.

What Anthropic Is Claiming

Anthropic's February 2026 report names three Chinese AI labs and provides specific numbers for each:

Lab	Known For	Exchanges	Targeted Capabilities
DeepSeek	DeepSeek R1, V4	~150,000	Reasoning, rubric-based grading, censorship-safe alternatives
Moonshot	Kimi models	~3.4 million	Computer vision, computer use, tool use
MiniMax	MiniMax models	~13 million	Agentic coding, tool use, orchestration

The total across all campaigns: approximately 16 million exchanges through roughly 24,000 accounts that Anthropic describes as fraudulent.

Detection Methods

Anthropic claims high-confidence attribution through:

IP address correlation - linking accounts to known lab infrastructure
Request metadata - patterns in API usage that match lab behavior
Infrastructure indicators - shared proxy and cluster architectures
Industry corroboration - other companies observing the same actors

The Proxy Problem

Anthropic also describes commercial proxy services that resell Claude access at scale, particularly to users in China where Anthropic's models are not directly available. One proxy reportedly managed over 20,000 accounts simultaneously, mixing what Anthropic calls distillation traffic with regular customer requests to avoid detection.

This part of the report has the strongest supporting evidence. Multiple independent sources confirm that Chinese proxy services offering discounted Claude access have been operating for months. These services use distributed account architectures to spread traffic across many accounts and IP addresses.

ClawOneClick

—

Deploy your AI assistant in minutes

Get Started Free

Any AI model

4+ channels

Custom skills

The Numbers in Context

The numbers Anthropic published deserve scrutiny. To understand whether 16 million exchanges represent a massive extraction operation or routine usage, context matters.

What Counts as an "Exchange"

An exchange is not the same as a user message. Every time a model responds - including tool calls, search results, and multi-step agent operations - that counts as a separate exchange. A single user request to an AI coding agent can easily generate 10-50 exchanges as the model reads files, searches code, and makes edits.

Scenario	User Actions	Actual Exchanges
Simple chat question	1 message	1 exchange
Chat with web search	1 message	3-4 exchanges
Coding agent task	1 prompt	10-50 exchanges
Deep research task	1 request	30-100 exchanges

Scale Comparison

For perspective, even a moderately popular AI application can generate millions of exchanges per month. A chat platform with modest traffic can easily process 100,000-160,000 exchanges per day. That means DeepSeek's entire alleged campaign of 150,000 exchanges is comparable to a single day of traffic for a mid-sized AI chat application.

For benchmark testing specifically, running a standard benchmark like SWE-Bench (2,294 coding tasks) with an average of 50 tool calls per task generates roughly 115,000 exchanges in a single run. A few rounds of benchmark tuning could easily produce 150,000 exchanges.

MiniMax Context

MiniMax operated a consumer-facing agent product that offered multiple AI models as options, including Claude. A product with active users doing deep research and agentic tasks could generate 13 million exchanges through normal commercial use. Anthropic's report notes that they detected the campaign before MiniMax released a new model, and that MiniMax pivoted traffic to new Claude releases within 24 hours - but this behavior also matches normal product usage patterns where users migrate to the latest available model.

Why Claude Models Are a Unique Target

There is one technical detail that makes Anthropic's models uniquely valuable for distillation compared to other frontier labs:

Lab	Reasoning Traces	Implication
OpenAI	Hidden/obfuscated	Cannot see actual reasoning steps
Google	Summarized by separate model	Reasoning is paraphrased, not original
xAI	Obfuscated	Similar to OpenAI approach
Anthropic	Fully visible	Complete reasoning chain available

Anthropic is the only major lab that does not hide or obfuscate the reasoning traces in their models. When Claude thinks through a problem, you can see every step of the reasoning process. This was a deliberate developer-friendly decision - it helps builders debug prompts, steer model behavior, and understand why a model made specific choices.

But it also means Claude's outputs contain more training-valuable data than any other frontier model. If you have the full reasoning chain (not just the final answer), that data is significantly more useful for training another model to reason similarly.

The Industry Debate

The AI community response to Anthropic's report has been sharply divided.

The Skeptical View

Critics point to several issues:

The numbers are small. 150,000 exchanges for DeepSeek is trivially achievable through normal benchmarking and evaluation
Legitimate use cases exist. MiniMax had a product that used Claude commercially. Moonshot and DeepSeek need to benchmark against competitors
Pattern of accusations. Anthropic has previously made similar claims against WindSurf, xAI, and OpenAI - claims that were disputed or unverified
Vague evidence. The published "example prompt" maps closely to standard research and analysis system prompts used in commercial products
No third-party verification. Anthropic has not shared raw evidence with independent auditors

The Supportive View

Defenders of Anthropic's position argue:

Scale and coordination matter. Even if individual exchanges are benign, coordinated campaigns across thousands of accounts suggest intent beyond normal usage
Terms of service are clear. Using model outputs to train competing models violates Anthropic's ToS regardless of the volume
Proxy infrastructure is real. The proxy reselling operation is independently confirmed by multiple sources
National security framing. Models trained through distillation may lack safety guardrails present in the original

The Open-Weight Question

Anthropic's report includes a controversial statement: if distilled models are open-sourced, the risks multiply because capabilities spread beyond any single government's control. This has been interpreted by many as a push against open-weight AI models - a notable position given that Anthropic is the only major lab that has released zero open-weight models. OpenAI released GPT-4o mini, Google has the Gemma series, Meta has Llama, and all major Chinese labs publish open-weight models regularly.

The Bigger Picture - Where Is the Line

The distillation debate raises fundamental questions about data, training, and competition in AI:

Question	Why It Matters
Is training on Claude-generated code from public GitHub repos distillation?	Massive amounts of Claude-written code exist in public repositories
Is sharing Claude outputs on the internet a violation?	Any public Claude conversation could become training data
Where does benchmarking end and distillation begin?	Labs need to evaluate competitors to improve their own models
How much abstraction makes data "clean"?	At what point does a model output stop being attributable?

These questions have no clear answers yet, and the industry lacks agreed-upon standards. Every lab draws the line differently - often in ways that benefit their own competitive position.

What This Means for AI Users and OpenClaw

Regardless of whether Anthropic's specific claims are accurate, this situation highlights why model diversity matters:

ClawOneClick

—

Deploy your AI assistant in minutes

Get Started Free

Any AI model

4+ channels

Custom skills

API access is not guaranteed. Labs can and do ban competitors, regions, and even individual companies from accessing their models
Terms of service change. What is allowed today may not be allowed tomorrow
Model routing is strategic. Spreading workloads across multiple models reduces dependency on any single provider
Open-weight models provide insurance. Models you can run locally cannot be revoked

For OpenClaw users, the best strategy remains the same: configure model routing to use the best model for each task while maintaining fallbacks. If one provider restricts access or changes terms, your workflows continue with alternative models.

Practical Recommendations

Action	Why
Use model routing across providers	Reduces single-provider dependency
Keep open-weight models as fallbacks	Cannot be revoked or restricted
Monitor provider terms of service	Policies change frequently in 2026
Diversify across US, EU, and Chinese models	Geographic policy risks affect availability

Frequently Asked Questions

What is a distillation attack in AI?

A distillation attack, as defined by Anthropic, is when a competitor systematically queries a frontier AI model to collect outputs and uses that data to train their own models. The term was coined by Anthropic for this report. Traditional distillation is a standard machine learning technique used by all major labs to create smaller models from larger ones.

Did DeepSeek steal from Anthropic?

Anthropic claims DeepSeek conducted approximately 150,000 exchanges targeting reasoning capabilities. DeepSeek has not publicly responded. The evidence has not been independently verified, and critics note the volume is consistent with normal benchmarking activity.

Why does this matter for AI model users?

This debate could lead to stricter API access policies, more aggressive account bans, and potential regulatory action. It highlights the importance of not relying on a single AI provider and maintaining access to multiple models including open-weight alternatives.

Are open-weight AI models at risk?

Anthropic's report suggests that open-sourced distilled models multiply security risks. This position is controversial. Open-weight models from Meta (Llama), Google (Gemma), Alibaba (Qwen), and DeepSeek remain widely available and are a cornerstone of the AI ecosystem for developers and researchers.

What is the difference between distillation and benchmarking?

Benchmarking involves running standardized tests against a model to measure its performance. Distillation involves collecting model outputs to use as training data. The line between them is blurry - both require sending many queries and collecting responses. The intent and scale are what Anthropic uses to distinguish them.

How does this affect OpenClaw users?

OpenClaw supports multiple AI models from different providers. If any provider restricts access or changes policies, users can route tasks to alternative models. This situation reinforces the value of model-agnostic agent frameworks.

Conclusion

Anthropic's distillation report is the most detailed public accusation of cross-lab model extraction to date. Whether the claims are fully substantiated or strategically motivated, they highlight real tensions in the AI industry around data, competition, and the boundaries of fair use. The proxy reselling infrastructure appears genuine, but the attribution to specific labs remains unverified by third parties.

For AI users and builders, the takeaway is clear: diversify your model access, maintain open-weight fallbacks, and do not build critical workflows on a single provider.

Configure model routing on OpenClaw -- stay resilient no matter how the AI landscape shifts. Extend your agent with the ClawHub top skills 2026 — the ClawHub popular skills from the OpenClaw ClawHub skills list keep your workflows running regardless of which models are available.

Sources: Anthropic Official Report (February 2026), industry analysis, LM Council, community discussion (February 2026).

Anthropic Distillation Attacks: What Chinese AI Labs Are Accused Of and What It Means

TL;DR — Quick Answer

What Is AI Model Distillation

Why Distillation Became Controversial

What Anthropic Is Claiming

Detection Methods

The Proxy Problem

ClawOneClick

The Numbers in Context

What Counts as an "Exchange"

Scale Comparison

MiniMax Context

Why Claude Models Are a Unique Target

The Industry Debate

The Skeptical View

The Supportive View

The Open-Weight Question

The Bigger Picture - Where Is the Line

What This Means for AI Users and OpenClaw

ClawOneClick

Practical Recommendations

Frequently Asked Questions

What is a distillation attack in AI?

Did DeepSeek steal from Anthropic?

Why does this matter for AI model users?

Are open-weight AI models at risk?

What is the difference between distillation and benchmarking?

How does this affect OpenClaw users?

Conclusion

Was this article helpful?

Before you go...

ClawOneClick

Deploy your AI assistant in minutes

Related Articles

Latest AI Models February 2026: GPT-5.3 vs Claude Opus 4.6 vs Gemini 3.1 Pro vs Grok 4.20

OpenClaw OpenAI Acqui-Hire: Peter Steinberger Joins to Build AI Agents

Choosing the Right AI Model for Your Assistant: 2026 Guide

ClawOneClick

Contact Us