The Attack That Looked Like Nothing at All: Anthropic's Distillation Breach Breakdown

The Incident
What is a distillation attack?
The Masterstroke: looking exactly like you
The anatomy of DeepSeek's extraction
How Anthropic caught them
What Treblle’s API Observability would have changed
Key takeaways: securing APIs in the age of AI extraction
The bottom line

Three Chinese AI labs stole 16 million Claude interactions without triggering a single obvious alarm. They didn't hack anything. They just looked like regular users—until the numbers told a different story.

The Incident

In February 2026, Anthropic published one of the most technically significant security disclosures in AI history. Three Chinese AI laboratories—DeepSeek, Moonshot AI, and MiniMax—ran large-scale campaigns to extract Claude’s capabilities to train their own competing models.

The combined damage: over 16 million exchanges harvested through approximately 24,000 fraudulent accounts.

But here's what makes this breach unlike almost any other covered in this series: there was no vulnerability. No misconfigured database, no exposed API key like with JLR, no unpatched CVE. The attackers used Claude exactly as it was designed to be used, but not as it was intended to be used.

If this happened inside your ecosystem, would you even recognize it?

What is a distillation attack?

Knowledge distillation is a legitimate and widely used (machine) learning technique. A powerful "teacher" model guides a smaller "student" model to mimic its behavior, allowing organizations to build compact, efficient versions of their flagship systems.

It’s a Michelin-star chef with 20 years of experience who, in 6 months, teaches his junior chefs plating, flavor logic, and ingredient balancing. Nothing wrong with that.

But if a competing restaurant sends hundreds of diners to order every dish repeatedly, carefully documenting flavors, and reverse-engineering the sauces to extract the process for creating that dish, that would bring the restaurant to ruin (a distillation attack).

Now, distillation attacks work by systematically querying a competitor’s API, collecting outputs, and using them to train your (competing) model. Done at sufficient volume, you can approximate not just surface-level answers but the underlying reasoning patterns, safety behaviors, and specialized capabilities that took your target years and billions of dollars to develop.

Security doesn’t pick it up because:

Traditional security tools look for break-ins.
Distillation attacks look like regular usage.
Usage often looks like revenue.
Revenue is rarely investigated.

It's industrial espionage via API call.

The Scale, by Lab

Lab	Exchanges	Primary Targets
MiniMax	13,000,000+	Agentic coding, task orchestration
Moonshot AI	3,400,000+	Agentic reasoning, tool use, computer vision, and reasoning traces
DeepSeek	150,000+	Chain-of-thought data, reinforcement learning rewards, censorship-safe query rewrites
Combined	16,550,000+	Claude's most differentiated capabilities

MiniMax ran the largest campaign—and Anthropic caught it while it was still active. When Anthropic released a new Claude model mid-campaign, MiniMax pivoted within 24 hours, redirecting nearly half its traffic to begin extracting capabilities from the latest system.

That's not opportunistic. That's an operation.

The Masterstroke: looking exactly like you

This is the part that should concern every API provider, not just Anthropic.

These labs didn't exploit a zero-day. They didn't social-engineer an employee. They simply created accounts — often thousands of them — and used the API exactly as it was built to function, but not for the purpose it was meant to serve. From a surface-level perspective, the traffic was largely indistinguishable from legitimate usage.

The weapons were subtlety and volume.

To evade detection, the labs used "hydra cluster" architectures: sprawling proxy networks that distributed requests across thousands of accounts simultaneously. In one documented case, a single proxy network managed more than 20,000 fraudulent accounts simultaneously, deliberately mixing distillation-focused queries with unrelated, mundane requests to flatten any anomalous signals.

If you banned one account, two more took its place. The hydra regrew.

DeepSeek went further still, generating synchronized traffic across accounts with identical usage patterns, shared payment methods, and coordinated timing—structured specifically to resemble load balancing rather than coordinated extraction. On paper, it looked like a high-volume customer managing API costs across a distributed team.

The tells were subtle:

Prompt structure. Requests designed to extract detailed reasoning processes at scale, not to complete genuine user tasks.
Prompt focus. Disproportionate concentration on Claude's specific differentiators: agentic reasoning, tool use, and coding. Focused on processes, not on outputs.
Geographical and infrastructure signals. It was an “all roads lead to Rome” situation, where IP addresses and metadata (eventually) traced back to a shared origin infrastructure.
Payment correlation. Accounts that appeared independent sharing the same payment sources
Coordinated timing. Traffic patterns that resembled an ant colony with a single purpose in mind, not a million people doing millions of unrelated tasks.

None of these signals is a smoking gun on its own. Taken together, across 24,000 accounts, they form a pattern that only a sufficiently sophisticated detection system can catch.

The question for most organizations is: Do you have that system?

What made this hard to catch

Most API abuse looks like abuse. Credential stuffing generates authentication failures. Scrapers hit rate limits. Bots produce inhuman request velocities.

Distillation attacks produce none of these signals.

The requests are well-formed. The accounts are properly authenticated. The API keys are valid and paid for. The rate limits may not even be hit, because the whole point of the hydra architecture is to distribute load below the threshold of any single account's limits while maximizing aggregate throughput.

For most organizations running a standard API stack, this traffic would be invisible. It would look like growth. It’s no wonder that the best heist is the one you don’t know about.

According to Treblle's 2025 Anatomy of an API report—which analyzed over 1 billion API requests across the global digital economy—47% of APIs process requests without any authentication. Even among those that do authenticate, most have no way to ask: what are these authenticated users actually doing, collectively, over time?

That gap is exactly what distillation attackers live in.

The anatomy of DeepSeek's extraction

DeepSeek's campaign was the smallest by volume but arguably the most technically sophisticated. Rather than simply collecting responses, they engineered prompts specifically designed to generate chain-of-thought training data.

The technique: ask Claude to imagine it had just produced a correct answer to a complex problem, then articulate the step-by-step internal reasoning that led to that answer—in detail, from scratch.

What looks like a curious philosophical prompt is actually a data pipeline. Claude's response becomes a labelled training example for a reinforcement learning reward model, teaching a smaller student model not just what to answer but how to reason. It’s extracting the Michelin-star chef’s process for creating food so a competing restaurant can use it.

DeepSeek also specifically targeted censorship-safe query rewrites—prompting Claude to rephrase politically sensitive questions about dissidents, party leaders, and authoritarianism in ways that could be used to train their own models to navigate censored topics without triggering safety filters.

Every one of those requests looked, in isolation, like an edge-case user test.

How Anthropic caught them

We expected nothing short of perfection from Anthropic. They deployed a multi-layered detection approach:

Behavioral fingerprinting. Classifiers trained to detect chain-of-thought elicitation patterns and capability extraction sequences in API traffic
Cross-account correlation. Identifying coordinated activity across large groups of apparently independent accounts (an ant colony).
IP and infrastructure analysis. Metadata correlation revealing shared origin infrastructure beneath the proxy layers (all roads lead to Rome).
Payment metadata analysis. Linking accounts that appeared independent through financial signals (multiple buyers, but only one payer).
Industry corroboration. Intelligence sharing with other AI labs that observed identical actors and behaviors on their own platforms (“peer review”).

They identified each campaign "with high confidence." When they caught MiniMax mid-operation, they were watching in real time.

This is world-class threat detection. It requires dedicated security research teams, sophisticated ML classifiers tailored to this threat model, and an intelligence-sharing network across the industry.

Anthropic has all of those things.

Most organizations building on AI APIs—or serving them—do not.

The industry context: Not everyone is Anthropic

This matters beyond the specific incident.

Google's Threat Intelligence Group identified similar "model extraction attacks" against Gemini throughout 2025. This isn’t an isolated case, but a pattern that will be used against any organization that exposes valuable AI capabilities via API.

What makes it particularly dangerous is the economics: distillation attacks are cheap.
The attacker pays cheap API rates. The target pays in years of R&D and billions. Asymmetrical gains at its finest.

Given the rewards for attackers, these kinds of attacks will continue until the target patches the exploit.

What Treblle’s API Observability would have changed

Here is the core question for any organization running APIs that power valuable products: if this happened to you, would you see it?

Anthropic has purpose-built detection infrastructure. For the rest of the ecosystem, API observability is the first and most accessible line of defense.

Consider what full API visibility provides in this exact scenario:

User-level request tracing

With per-user, per-account request logging, security teams can see not just aggregate traffic but what each customer is doing. Patterns that are invisible at the endpoint level become starkly visible at the user level:

Account A and Account B have different email addresses, different IP ranges, and different company names—but their prompt structures are statistically similar
Account C pivoted its request type distribution within 24 hours of a model update
Accounts D through Z share payment fingerprints despite registering independently

None of this requires real-time action. It requires retrospective traceability. A request log that captures who made this call, where from, and what they asked turns a months-long blind spot into an afternoon investigation.

Geographic and infrastructure signals

Treblle's observability layer captures request origin metadata on every single call—not just the IP, but enriched location data. An account registered to a San Francisco startup that consistently sends requests from the same Eastern data center cluster as nineteen other accounts is a signal worth investigating.

In the Moltbook breach we covered previously, the vulnerability was a missing lock on the door. In a distillation attack, the attackers have a valid key—but they're using it to copy everything inside. Observability is how you notice the copying.

Anomaly detection across the account population

Real-time API monitoring enables the kind of cross-account correlation that Anthropic's security team built from scratch. Traffic spike detection, saved search alerts, and custom governance rules can be configured to surface:

Accounts whose prompt entropy drops suddenly (same structures, repeated at scale)
Groups of accounts whose active hours, request volumes, and endpoint distributions cluster suspiciously
Sudden pivots in request type following product announcements

Full request audit trail

If Anthropic's detection had been less sophisticated—if they'd noticed something was wrong but weren't sure when it started or how far it had gotten—a complete, timestamped, per-account audit trail of every request and response would let them reconstruct the attack with precision.

That is exactly what Treblle captures: not a sample, not aggregated metrics, but the full forensic record.

Anthropic built the tools to catch this. For organizations that haven't built their own security infrastructure, Treblle provides that visibility for a fraction of the price.

Key takeaways: securing APIs in the age of AI extraction

1. Authenticated traffic is not trusted traffic

These attackers had valid accounts and paid their bills. Authentication tells you who claims to be calling. Observability tells you what they're actually doing. Both are necessary. Neither alone is sufficient.

2. The threat is not theoretical

Distillation attacks have been confirmed against Anthropic, Google, and OpenAI. Any organization with a valuable AI API—whether it powers a product, a platform, or proprietary internal tooling—is a potential target. The technique scales down as well as up: an attacker doesn't need millions of exchanges if your capability set is narrower.

3. You cannot investigate what you cannot see

Anthropic caught MiniMax mid-campaign. Most organizations would have found out when a competitor launched a product that replicated capabilities they'd spent years building. The difference isn't luck—it's instrumentation. Full request traceability, user-level logging, and behavioral analytics turn a months-long blind spot into a timely detection.

4. Aggregate signals require aggregate visibility

No single request in these campaigns was obviously malicious. The signal was in the pattern across thousands of accounts. Detecting that pattern requires visibility across your entire API consumer population, not per-account dashboards in isolation.

5. The threat scales to your threat model, not Anthropic’s

Anthropic needed classifiers trained on frontier model extraction. You might need something simpler: an alert when five accounts registered within the same 48-hour window are making statistically similar requests from overlapping infrastructure. That's achievable with standard observability tooling—but only if the data is there to query.

The bottom line

The Anthropic distillation attack is a masterclass in what sophisticated API abuse looks like in 2026. No vulnerabilities were exploited. No alarms were tripped. Three labs quietly ran 16 million extraction queries through networks of fake accounts designed to look indistinguishable from legitimate customers.

The only things that gave them away were their prompts, their patterns, and their locations—signals that required deep, cross-account behavioral analysis to surface.

Anthropic had the tools to find those signals. Most organizations don't…yet.

This is the threat that keeps scaling as AI APIs become more valuable. It doesn't require sophisticated hacking skills. It requires patience, accounts, and the absence of visibility on the other side.

Don't give attackers that absence.

*Treblle's API Intelligence Platform gives every API-powered organization the observability, user-level traceability, and behavioral analytics needed to detect exactly this kind of threat—without a dedicated security research team. *Book a demo to see how Treblle can help you see what your API traffic is really doing.