The Anthropic Leak

The AI safety company leaked its own source code and accidentally revealed its most powerful model. Twice in five days.

Anthropic had two security lapses in five days. First, a configuration error exposed nearly 3,000 internal files, including a draft blog post detailing Mythos—an unreleased AI model described as far more powerful than anything currently available. Days later, an npm packaging error leaked 500,000 lines of Claude Code source code across 1,900 files. Anthropic says no customer data was compromised.

1. This Is Embarrassing but Not Dangerous

A packaging error isn't a security breach. The source code doesn't contain model weights, and the Mythos draft was a blog post, not a weapon.

Anthropic called it "a release packaging issue caused by human error, not a security breach." No customer data, no credentials, no model weights exposed. The Claude Code leak came from a source map file in an npm package that pointed to an unobfuscated zip archive on Cloudflare—embarrassing, but the kind of mistake any fast-shipping engineering team can make.

What people found in the code was interesting, not alarming. Forty-four feature flags for unreleased tools, including KAIROS—an always-on background daemon that runs a "dreaming" memory-consolidation process during idle time—and BUDDY, a terminal pet with 18 species. The frustration regex scans prompts for profanity as a sentiment signal. Weird? Sure. Dangerous? No.

The Mythos leak was a draft blog post, not a model release. A CMS configuration error made files publicly searchable. The blog described Mythos as a step beyond Opus—larger, more capable, more expensive. But the model weights weren't exposed. What leaked was marketing copy about a model in early testing, not the model itself.

2. This Is Exactly What an AI Safety Company Should Never Do

Anthropic lectures everyone else about AI safety. Then it can't keep its own files locked down.

Two leaks in five days from the company that brands itself as the responsible one. Anthropic's entire pitch is that it takes safety more seriously than OpenAI, Google, or Meta. That pitch is harder to make when your CMS is publicly searchable and your npm packages contain unobfuscated source code.

The Mythos details are genuinely concerning. Anthropic's own draft described a model "currently far ahead of any other AI model in cyber capabilities" that "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders." Anthropic was privately warning government officials about the cybersecurity risks. Leaking the existence and capabilities of exactly that kind of model—even accidentally—is the kind of thing safety-focused organizations are supposed to prevent.

The "undercover mode" discovery doesn't help. The source code included a file called undercover.ts that strips all traces of Anthropic internals when Claude Code is used outside the company—instructing the model to never mention codenames like "Capybara" or "Tengu." The optics of a safety company running an "undercover mode" are not great.

3. This Is Actually Good for Transparency

For once, we got to look inside an AI company. What we found was mostly mundane—and that's the point.

The code leak gave the public a rare look at how an AI company actually works. Feature flags, sentiment analysis, unreleased tools—this is normal software development. The 44 feature flags show a company iterating and experimenting, not one hiding something sinister. The frustration regex is crude but transparent—they're measuring user experience, not spying.

Mythos being public forces a conversation that should have happened anyway. An AI model with "unprecedented cybersecurity risks" shouldn't be discussed only in private government briefings. If Anthropic was warning officials about a model that can outpace cyber defenders, the public deserved to know that conversation was happening—even if this wasn't how Anthropic wanted it to come out.

The leaks prove Anthropic is building real things, not just publishing safety papers. KAIROS, BUDDY, the feature flags—these are products in development. The gap between what Anthropic says publicly and what it's building internally is narrower than many assumed. For an industry plagued by vaporware and hype, that's information worth having.

Where This Lands

Anthropic leaked its own code and its most powerful model's existence, both by accident. The company says no customer data was exposed, and the code itself was more interesting than dangerous. On the other hand, two security lapses in five days from the company that positions itself as the safety-first alternative is a credibility problem that no press release can fully repair. Where this lands depends on whether you judge Anthropic by what the leaks contained—mostly mundane—or by what the leaks say about the company's operational discipline.

1. This Is Embarrassing but Not Dangerous

2. This Is Exactly What an AI Safety Company Should Never Do

3. This Is Actually Good for Transparency

Where This Lands

Sources

Get the perspectives.